* [PATCH v5 00/10] Stage-2 huge mappings for pKVM np-guests
@ 2025-05-20 8:51 Vincent Donnefort
2025-05-20 8:51 ` [PATCH v5 01/10] KVM: arm64: Handle huge mappings for np-guest CMOs Vincent Donnefort
` (9 more replies)
0 siblings, 10 replies; 14+ messages in thread
From: Vincent Donnefort @ 2025-05-20 8:51 UTC (permalink / raw)
To: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will
Cc: qperret, linux-arm-kernel, kvmarm, linux-kernel, kernel-team,
Vincent Donnefort
Hi all,
This series adds support for stage-2 huge mappings (PMD_SIZE) to pKVM
np-guests, that is installing PMD-level mappings in the stage-2,
whenever the stage-1 is backed by either Hugetlbfs or THPs.
The last patch of that series is an optimization for CMOs using a shared
PMD_SIZE fixmap.
Changes since v4 (https://lore.kernel.org/all/20250509131706.2336138-1-vdonnefort@google.com/)
- Make guest CMO functions working with any input (Marc)
- Improve for_each_hyp_page() and fold struct hyp_page declaration
(Marc)
- in mem_protect.c "PMD_SIZE" replaced with kvm_granule_size(KVM_PGTABLE_LAST_LEVEL - 1)
(Marc)
- Remove useless guest pgtable callback force_pte (Marc)
- Fixblock functions fallsback on fixmap on 64KiB pages systems.
(Marc)
Changes since v3 (https://lore.kernel.org/all/20250407082706.1239603-1-vdonnefort@google.com/)
- Rebase on kvmarm/next
Changes since v2 (https://lore.kernel.org/all/20250306110038.3733649-1-vdonnefort@google.com/)
- Fix PUD_SIZE -> PMD_SIZE enforcement (Quentin)
- Rework pkvm_host_share_guest() to remove one hyp_page walk (Quentin)
- Remove one pgtable walk into __check_host_shared_guest() (Quentin)
- Return EBUSY on host_share_guest_count overflow
Changes since v1 (https://lore.kernel.org/all/20250228102530.1229089-1-vdonnefort@google.com/)
- WARN_ON() on !PAGE_ALIGNED size for guest CMOs (Quentin)
- check_range_allowed_memory() before accessing the Vmemmap (Quentin)
Quentin Perret (2):
KVM: arm64: Convert pkvm_mappings to interval tree
KVM: arm64: Add a range to pkvm_mappings
Vincent Donnefort (8):
KVM: arm64: Handle huge mappings for np-guest CMOs
KVM: arm64: Introduce for_each_hyp_page
KVM: arm64: Add a range to __pkvm_host_share_guest()
KVM: arm64: Add a range to __pkvm_host_unshare_guest()
KVM: arm64: Add a range to __pkvm_host_wrprotect_guest()
KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest()
KVM: arm64: Stage-2 huge mappings for np-guests
KVM: arm64: np-guest CMOs with PMD_SIZE fixmap
arch/arm64/include/asm/kvm_pgtable.h | 7 +-
arch/arm64/include/asm/kvm_pkvm.h | 2 +
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 8 +-
arch/arm64/kvm/hyp/include/nvhe/memory.h | 16 +-
arch/arm64/kvm/hyp/include/nvhe/mm.h | 4 +-
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 16 +-
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 264 +++++++++++-------
arch/arm64/kvm/hyp/nvhe/mm.c | 97 ++++++-
arch/arm64/kvm/hyp/nvhe/setup.c | 15 +-
arch/arm64/kvm/hyp/pgtable.c | 6 -
arch/arm64/kvm/mmu.c | 6 +-
arch/arm64/kvm/pkvm.c | 129 ++++-----
12 files changed, 365 insertions(+), 205 deletions(-)
base-commit: c4e91ea0cc7e6345a4f7b8167e838d728ca86c30
--
2.49.0.1143.g0be31eac6b-goog
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v5 01/10] KVM: arm64: Handle huge mappings for np-guest CMOs
2025-05-20 8:51 [PATCH v5 00/10] Stage-2 huge mappings for pKVM np-guests Vincent Donnefort
@ 2025-05-20 8:51 ` Vincent Donnefort
2025-05-20 8:51 ` [PATCH v5 02/10] KVM: arm64: Introduce for_each_hyp_page Vincent Donnefort
` (8 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Vincent Donnefort @ 2025-05-20 8:51 UTC (permalink / raw)
To: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will
Cc: qperret, linux-arm-kernel, kvmarm, linux-kernel, kernel-team,
Vincent Donnefort
clean_dcache_guest_page() and invalidate_icache_guest_page() accept a
size as an argument. But they also rely on fixmap, which can only map a
single PAGE_SIZE page.
With the upcoming stage-2 huge mappings for pKVM np-guests, those
callbacks will get size > PAGE_SIZE. Loop the CMOs on a PAGE_SIZE basis
until the whole range is done.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 31173c694695..be4f7c5612f8 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -219,14 +219,32 @@ static void guest_s2_put_page(void *addr)
static void clean_dcache_guest_page(void *va, size_t size)
{
- __clean_dcache_guest_page(hyp_fixmap_map(__hyp_pa(va)), size);
- hyp_fixmap_unmap();
+ size += va - PTR_ALIGN_DOWN(va, PAGE_SIZE);
+ va = PTR_ALIGN_DOWN(va, PAGE_SIZE);
+ size = PAGE_ALIGN(size);
+
+ while (size) {
+ __clean_dcache_guest_page(hyp_fixmap_map(__hyp_pa(va)),
+ PAGE_SIZE);
+ hyp_fixmap_unmap();
+ va += PAGE_SIZE;
+ size -= PAGE_SIZE;
+ }
}
static void invalidate_icache_guest_page(void *va, size_t size)
{
- __invalidate_icache_guest_page(hyp_fixmap_map(__hyp_pa(va)), size);
- hyp_fixmap_unmap();
+ size += va - PTR_ALIGN_DOWN(va, PAGE_SIZE);
+ va = PTR_ALIGN_DOWN(va, PAGE_SIZE);
+ size = PAGE_ALIGN(size);
+
+ while (size) {
+ __invalidate_icache_guest_page(hyp_fixmap_map(__hyp_pa(va)),
+ PAGE_SIZE);
+ hyp_fixmap_unmap();
+ va += PAGE_SIZE;
+ size -= PAGE_SIZE;
+ }
}
int kvm_guest_prepare_stage2(struct pkvm_hyp_vm *vm, void *pgd)
--
2.49.0.1143.g0be31eac6b-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v5 02/10] KVM: arm64: Introduce for_each_hyp_page
2025-05-20 8:51 [PATCH v5 00/10] Stage-2 huge mappings for pKVM np-guests Vincent Donnefort
2025-05-20 8:51 ` [PATCH v5 01/10] KVM: arm64: Handle huge mappings for np-guest CMOs Vincent Donnefort
@ 2025-05-20 8:51 ` Vincent Donnefort
2025-05-20 8:51 ` [PATCH v5 03/10] KVM: arm64: Add a range to __pkvm_host_share_guest() Vincent Donnefort
` (7 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Vincent Donnefort @ 2025-05-20 8:51 UTC (permalink / raw)
To: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will
Cc: qperret, linux-arm-kernel, kvmarm, linux-kernel, kernel-team,
Vincent Donnefort
Add a helper to iterate over the hypervisor vmemmap. This will be
particularly handy with the introduction of huge mapping support
for the np-guest stage-2.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index eb0c2ebd1743..dee1a406b0c2 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -96,24 +96,24 @@ static inline struct hyp_page *hyp_phys_to_page(phys_addr_t phys)
#define hyp_page_to_virt(page) __hyp_va(hyp_page_to_phys(page))
#define hyp_page_to_pool(page) (((struct hyp_page *)page)->pool)
-static inline enum pkvm_page_state get_host_state(phys_addr_t phys)
+static inline enum pkvm_page_state get_host_state(struct hyp_page *p)
{
- return (enum pkvm_page_state)hyp_phys_to_page(phys)->__host_state;
+ return p->__host_state;
}
-static inline void set_host_state(phys_addr_t phys, enum pkvm_page_state state)
+static inline void set_host_state(struct hyp_page *p, enum pkvm_page_state state)
{
- hyp_phys_to_page(phys)->__host_state = state;
+ p->__host_state = state;
}
-static inline enum pkvm_page_state get_hyp_state(phys_addr_t phys)
+static inline enum pkvm_page_state get_hyp_state(struct hyp_page *p)
{
- return hyp_phys_to_page(phys)->__hyp_state_comp ^ PKVM_PAGE_STATE_MASK;
+ return p->__hyp_state_comp ^ PKVM_PAGE_STATE_MASK;
}
-static inline void set_hyp_state(phys_addr_t phys, enum pkvm_page_state state)
+static inline void set_hyp_state(struct hyp_page *p, enum pkvm_page_state state)
{
- hyp_phys_to_page(phys)->__hyp_state_comp = state ^ PKVM_PAGE_STATE_MASK;
+ p->__hyp_state_comp = state ^ PKVM_PAGE_STATE_MASK;
}
/*
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index be4f7c5612f8..1018a6f66359 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -60,6 +60,11 @@ static void hyp_unlock_component(void)
hyp_spin_unlock(&pkvm_pgd_lock);
}
+#define for_each_hyp_page(__p, __st, __sz) \
+ for (struct hyp_page *__p = hyp_phys_to_page(__st), \
+ *__e = __p + ((__sz) >> PAGE_SHIFT); \
+ __p < __e; __p++)
+
static void *host_s2_zalloc_pages_exact(size_t size)
{
void *addr = hyp_alloc_pages(&host_s2_pool, get_order(size));
@@ -485,7 +490,8 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
return -EAGAIN;
if (pte) {
- WARN_ON(addr_is_memory(addr) && get_host_state(addr) != PKVM_NOPAGE);
+ WARN_ON(addr_is_memory(addr) &&
+ get_host_state(hyp_phys_to_page(addr)) != PKVM_NOPAGE);
return -EPERM;
}
@@ -511,10 +517,8 @@ int host_stage2_idmap_locked(phys_addr_t addr, u64 size,
static void __host_update_page_state(phys_addr_t addr, u64 size, enum pkvm_page_state state)
{
- phys_addr_t end = addr + size;
-
- for (; addr < end; addr += PAGE_SIZE)
- set_host_state(addr, state);
+ for_each_hyp_page(page, addr, size)
+ set_host_state(page, state);
}
int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
@@ -636,16 +640,16 @@ static int check_page_state_range(struct kvm_pgtable *pgt, u64 addr, u64 size,
static int __host_check_page_state_range(u64 addr, u64 size,
enum pkvm_page_state state)
{
- u64 end = addr + size;
int ret;
- ret = check_range_allowed_memory(addr, end);
+ ret = check_range_allowed_memory(addr, addr + size);
if (ret)
return ret;
hyp_assert_lock_held(&host_mmu.lock);
- for (; addr < end; addr += PAGE_SIZE) {
- if (get_host_state(addr) != state)
+
+ for_each_hyp_page(page, addr, size) {
+ if (get_host_state(page) != state)
return -EPERM;
}
@@ -655,7 +659,7 @@ static int __host_check_page_state_range(u64 addr, u64 size,
static int __host_set_page_state_range(u64 addr, u64 size,
enum pkvm_page_state state)
{
- if (get_host_state(addr) == PKVM_NOPAGE) {
+ if (get_host_state(hyp_phys_to_page(addr)) == PKVM_NOPAGE) {
int ret = host_stage2_idmap_locked(addr, size, PKVM_HOST_MEM_PROT);
if (ret)
@@ -669,18 +673,14 @@ static int __host_set_page_state_range(u64 addr, u64 size,
static void __hyp_set_page_state_range(phys_addr_t phys, u64 size, enum pkvm_page_state state)
{
- phys_addr_t end = phys + size;
-
- for (; phys < end; phys += PAGE_SIZE)
- set_hyp_state(phys, state);
+ for_each_hyp_page(page, phys, size)
+ set_hyp_state(page, state);
}
static int __hyp_check_page_state_range(phys_addr_t phys, u64 size, enum pkvm_page_state state)
{
- phys_addr_t end = phys + size;
-
- for (; phys < end; phys += PAGE_SIZE) {
- if (get_hyp_state(phys) != state)
+ for_each_hyp_page(page, phys, size) {
+ if (get_hyp_state(page) != state)
return -EPERM;
}
@@ -931,7 +931,7 @@ int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu,
goto unlock;
page = hyp_phys_to_page(phys);
- switch (get_host_state(phys)) {
+ switch (get_host_state(page)) {
case PKVM_PAGE_OWNED:
WARN_ON(__host_set_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_SHARED_OWNED));
break;
@@ -983,9 +983,9 @@ static int __check_host_shared_guest(struct pkvm_hyp_vm *vm, u64 *__phys, u64 ip
if (WARN_ON(ret))
return ret;
- if (get_host_state(phys) != PKVM_PAGE_SHARED_OWNED)
- return -EPERM;
page = hyp_phys_to_page(phys);
+ if (get_host_state(page) != PKVM_PAGE_SHARED_OWNED)
+ return -EPERM;
if (WARN_ON(!page->host_share_guest_count))
return -EINVAL;
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 6d513a4b3763..c19860fc8183 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -190,6 +190,7 @@ static int fix_host_ownership_walker(const struct kvm_pgtable_visit_ctx *ctx,
enum kvm_pgtable_walk_flags visit)
{
enum pkvm_page_state state;
+ struct hyp_page *page;
phys_addr_t phys;
if (!kvm_pte_valid(ctx->old))
@@ -202,6 +203,8 @@ static int fix_host_ownership_walker(const struct kvm_pgtable_visit_ctx *ctx,
if (!addr_is_memory(phys))
return -EINVAL;
+ page = hyp_phys_to_page(phys);
+
/*
* Adjust the host stage-2 mappings to match the ownership attributes
* configured in the hypervisor stage-1, and make sure to propagate them
@@ -210,15 +213,15 @@ static int fix_host_ownership_walker(const struct kvm_pgtable_visit_ctx *ctx,
state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(ctx->old));
switch (state) {
case PKVM_PAGE_OWNED:
- set_hyp_state(phys, PKVM_PAGE_OWNED);
+ set_hyp_state(page, PKVM_PAGE_OWNED);
return host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_HYP);
case PKVM_PAGE_SHARED_OWNED:
- set_hyp_state(phys, PKVM_PAGE_SHARED_OWNED);
- set_host_state(phys, PKVM_PAGE_SHARED_BORROWED);
+ set_hyp_state(page, PKVM_PAGE_SHARED_OWNED);
+ set_host_state(page, PKVM_PAGE_SHARED_BORROWED);
break;
case PKVM_PAGE_SHARED_BORROWED:
- set_hyp_state(phys, PKVM_PAGE_SHARED_BORROWED);
- set_host_state(phys, PKVM_PAGE_SHARED_OWNED);
+ set_hyp_state(page, PKVM_PAGE_SHARED_BORROWED);
+ set_host_state(page, PKVM_PAGE_SHARED_OWNED);
break;
default:
return -EINVAL;
--
2.49.0.1143.g0be31eac6b-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v5 03/10] KVM: arm64: Add a range to __pkvm_host_share_guest()
2025-05-20 8:51 [PATCH v5 00/10] Stage-2 huge mappings for pKVM np-guests Vincent Donnefort
2025-05-20 8:51 ` [PATCH v5 01/10] KVM: arm64: Handle huge mappings for np-guest CMOs Vincent Donnefort
2025-05-20 8:51 ` [PATCH v5 02/10] KVM: arm64: Introduce for_each_hyp_page Vincent Donnefort
@ 2025-05-20 8:51 ` Vincent Donnefort
2025-05-20 8:51 ` [PATCH v5 04/10] KVM: arm64: Add a range to __pkvm_host_unshare_guest() Vincent Donnefort
` (6 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Vincent Donnefort @ 2025-05-20 8:51 UTC (permalink / raw)
To: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will
Cc: qperret, linux-arm-kernel, kvmarm, linux-kernel, kernel-team,
Vincent Donnefort
In preparation for supporting stage-2 huge mappings for np-guest. Add a
nr_pages argument to the __pkvm_host_share_guest hypercall. This range
supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a
4K-pages system).
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 26016eb9323f..47aa7b01114f 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -39,7 +39,7 @@ int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages);
int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
-int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu,
+int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu,
enum kvm_pgtable_prot prot);
int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
int __pkvm_host_relax_perms_guest(u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum kvm_pgtable_prot prot);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 59db9606e6e1..4d3d215955c3 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -245,7 +245,8 @@ static void handle___pkvm_host_share_guest(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(u64, pfn, host_ctxt, 1);
DECLARE_REG(u64, gfn, host_ctxt, 2);
- DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 3);
+ DECLARE_REG(u64, nr_pages, host_ctxt, 3);
+ DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 4);
struct pkvm_hyp_vcpu *hyp_vcpu;
int ret = -EINVAL;
@@ -260,7 +261,7 @@ static void handle___pkvm_host_share_guest(struct kvm_cpu_context *host_ctxt)
if (ret)
goto out;
- ret = __pkvm_host_share_guest(pfn, gfn, hyp_vcpu, prot);
+ ret = __pkvm_host_share_guest(pfn, gfn, nr_pages, hyp_vcpu, prot);
out:
cpu_reg(host_ctxt, 1) = ret;
}
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 1018a6f66359..8e0847aa090d 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -695,10 +695,9 @@ static enum pkvm_page_state guest_get_page_state(kvm_pte_t pte, u64 addr)
return pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte));
}
-static int __guest_check_page_state_range(struct pkvm_hyp_vcpu *vcpu, u64 addr,
+static int __guest_check_page_state_range(struct pkvm_hyp_vm *vm, u64 addr,
u64 size, enum pkvm_page_state state)
{
- struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
struct check_walk_data d = {
.desired = state,
.get_page_state = guest_get_page_state,
@@ -907,48 +906,72 @@ int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages)
return ret;
}
-int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu,
+static int __guest_check_transition_size(u64 phys, u64 ipa, u64 nr_pages, u64 *size)
+{
+ if (nr_pages == 1) {
+ *size = PAGE_SIZE;
+ return 0;
+ }
+
+ return -EINVAL;
+}
+
+int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu,
enum kvm_pgtable_prot prot)
{
struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
u64 phys = hyp_pfn_to_phys(pfn);
u64 ipa = hyp_pfn_to_phys(gfn);
- struct hyp_page *page;
+ u64 size;
int ret;
if (prot & ~KVM_PGTABLE_PROT_RWX)
return -EINVAL;
- ret = check_range_allowed_memory(phys, phys + PAGE_SIZE);
+ ret = __guest_check_transition_size(phys, ipa, nr_pages, &size);
+ if (ret)
+ return ret;
+
+ ret = check_range_allowed_memory(phys, phys + size);
if (ret)
return ret;
host_lock_component();
guest_lock_component(vm);
- ret = __guest_check_page_state_range(vcpu, ipa, PAGE_SIZE, PKVM_NOPAGE);
+ ret = __guest_check_page_state_range(vm, ipa, size, PKVM_NOPAGE);
if (ret)
goto unlock;
- page = hyp_phys_to_page(phys);
- switch (get_host_state(page)) {
- case PKVM_PAGE_OWNED:
- WARN_ON(__host_set_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_SHARED_OWNED));
- break;
- case PKVM_PAGE_SHARED_OWNED:
- if (page->host_share_guest_count)
- break;
- /* Only host to np-guest multi-sharing is tolerated */
- fallthrough;
- default:
- ret = -EPERM;
- goto unlock;
+ for_each_hyp_page(page, phys, size) {
+ switch (get_host_state(page)) {
+ case PKVM_PAGE_OWNED:
+ continue;
+ case PKVM_PAGE_SHARED_OWNED:
+ if (page->host_share_guest_count == U32_MAX) {
+ ret = -EBUSY;
+ goto unlock;
+ }
+
+ /* Only host to np-guest multi-sharing is tolerated */
+ if (page->host_share_guest_count)
+ continue;
+
+ fallthrough;
+ default:
+ ret = -EPERM;
+ goto unlock;
+ }
}
- WARN_ON(kvm_pgtable_stage2_map(&vm->pgt, ipa, PAGE_SIZE, phys,
+ for_each_hyp_page(page, phys, size) {
+ set_host_state(page, PKVM_PAGE_SHARED_OWNED);
+ page->host_share_guest_count++;
+ }
+
+ WARN_ON(kvm_pgtable_stage2_map(&vm->pgt, ipa, size, phys,
pkvm_mkstate(prot, PKVM_PAGE_SHARED_BORROWED),
&vcpu->vcpu.arch.pkvm_memcache, 0));
- page->host_share_guest_count++;
unlock:
guest_unlock_component(vm);
@@ -1169,6 +1192,9 @@ static void assert_page_state(void)
struct pkvm_hyp_vcpu *vcpu = &selftest_vcpu;
u64 phys = hyp_virt_to_phys(virt);
u64 ipa[2] = { selftest_ipa(), selftest_ipa() + PAGE_SIZE };
+ struct pkvm_hyp_vm *vm;
+
+ vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
host_lock_component();
WARN_ON(__host_check_page_state_range(phys, size, selftest_state.host));
@@ -1179,8 +1205,8 @@ static void assert_page_state(void)
hyp_unlock_component();
guest_lock_component(&selftest_vm);
- WARN_ON(__guest_check_page_state_range(vcpu, ipa[0], size, selftest_state.guest[0]));
- WARN_ON(__guest_check_page_state_range(vcpu, ipa[1], size, selftest_state.guest[1]));
+ WARN_ON(__guest_check_page_state_range(vm, ipa[0], size, selftest_state.guest[0]));
+ WARN_ON(__guest_check_page_state_range(vm, ipa[1], size, selftest_state.guest[1]));
guest_unlock_component(&selftest_vm);
}
@@ -1218,7 +1244,7 @@ void pkvm_ownership_selftest(void *base)
assert_transition_res(-EPERM, __pkvm_host_share_ffa, pfn, 1);
assert_transition_res(-EPERM, __pkvm_host_unshare_ffa, pfn, 1);
assert_transition_res(-EPERM, hyp_pin_shared_mem, virt, virt + size);
- assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, vcpu, prot);
+ assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
assert_transition_res(-ENOENT, __pkvm_host_unshare_guest, gfn, vm);
selftest_state.host = PKVM_PAGE_OWNED;
@@ -1237,7 +1263,7 @@ void pkvm_ownership_selftest(void *base)
assert_transition_res(-EPERM, __pkvm_host_donate_hyp, pfn, 1);
assert_transition_res(-EPERM, __pkvm_host_share_ffa, pfn, 1);
assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1);
- assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, vcpu, prot);
+ assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
assert_transition_res(-ENOENT, __pkvm_host_unshare_guest, gfn, vm);
assert_transition_res(0, hyp_pin_shared_mem, virt, virt + size);
@@ -1249,7 +1275,7 @@ void pkvm_ownership_selftest(void *base)
assert_transition_res(-EPERM, __pkvm_host_donate_hyp, pfn, 1);
assert_transition_res(-EPERM, __pkvm_host_share_ffa, pfn, 1);
assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1);
- assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, vcpu, prot);
+ assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
assert_transition_res(-ENOENT, __pkvm_host_unshare_guest, gfn, vm);
hyp_unpin_shared_mem(virt, virt + size);
@@ -1268,7 +1294,7 @@ void pkvm_ownership_selftest(void *base)
assert_transition_res(-EPERM, __pkvm_host_share_hyp, pfn);
assert_transition_res(-EPERM, __pkvm_host_unshare_hyp, pfn);
assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1);
- assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, vcpu, prot);
+ assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
assert_transition_res(-ENOENT, __pkvm_host_unshare_guest, gfn, vm);
assert_transition_res(-EPERM, hyp_pin_shared_mem, virt, virt + size);
@@ -1279,8 +1305,8 @@ void pkvm_ownership_selftest(void *base)
selftest_state.host = PKVM_PAGE_SHARED_OWNED;
selftest_state.guest[0] = PKVM_PAGE_SHARED_BORROWED;
- assert_transition_res(0, __pkvm_host_share_guest, pfn, gfn, vcpu, prot);
- assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, vcpu, prot);
+ assert_transition_res(0, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
+ assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
assert_transition_res(-EPERM, __pkvm_host_share_ffa, pfn, 1);
assert_transition_res(-EPERM, __pkvm_host_donate_hyp, pfn, 1);
assert_transition_res(-EPERM, __pkvm_host_share_hyp, pfn);
@@ -1289,7 +1315,7 @@ void pkvm_ownership_selftest(void *base)
assert_transition_res(-EPERM, hyp_pin_shared_mem, virt, virt + size);
selftest_state.guest[1] = PKVM_PAGE_SHARED_BORROWED;
- assert_transition_res(0, __pkvm_host_share_guest, pfn, gfn + 1, vcpu, prot);
+ assert_transition_res(0, __pkvm_host_share_guest, pfn, gfn + 1, 1, vcpu, prot);
WARN_ON(hyp_virt_to_page(virt)->host_share_guest_count != 2);
selftest_state.guest[0] = PKVM_NOPAGE;
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 83a737484046..0285e2cd2e7f 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -347,7 +347,7 @@ int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
return -EINVAL;
lockdep_assert_held_write(&kvm->mmu_lock);
- ret = kvm_call_hyp_nvhe(__pkvm_host_share_guest, pfn, gfn, prot);
+ ret = kvm_call_hyp_nvhe(__pkvm_host_share_guest, pfn, gfn, 1, prot);
if (ret) {
/* Is the gfn already mapped due to a racing vCPU? */
if (ret == -EPERM)
--
2.49.0.1143.g0be31eac6b-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v5 04/10] KVM: arm64: Add a range to __pkvm_host_unshare_guest()
2025-05-20 8:51 [PATCH v5 00/10] Stage-2 huge mappings for pKVM np-guests Vincent Donnefort
` (2 preceding siblings ...)
2025-05-20 8:51 ` [PATCH v5 03/10] KVM: arm64: Add a range to __pkvm_host_share_guest() Vincent Donnefort
@ 2025-05-20 8:51 ` Vincent Donnefort
2025-05-20 8:51 ` [PATCH v5 05/10] KVM: arm64: Add a range to __pkvm_host_wrprotect_guest() Vincent Donnefort
` (5 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Vincent Donnefort @ 2025-05-20 8:51 UTC (permalink / raw)
To: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will
Cc: qperret, linux-arm-kernel, kvmarm, linux-kernel, kernel-team,
Vincent Donnefort
In preparation for supporting stage-2 huge mappings for np-guest. Add a
nr_pages argument to the __pkvm_host_unshare_guest hypercall. This range
supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a
4K-pages system).
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 47aa7b01114f..19671edbe18f 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -41,7 +41,7 @@ int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu,
enum kvm_pgtable_prot prot);
-int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
+int __pkvm_host_unshare_guest(u64 gfn, u64 nr_pages, struct pkvm_hyp_vm *hyp_vm);
int __pkvm_host_relax_perms_guest(u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum kvm_pgtable_prot prot);
int __pkvm_host_wrprotect_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
int __pkvm_host_test_clear_young_guest(u64 gfn, bool mkold, struct pkvm_hyp_vm *vm);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 4d3d215955c3..5c03bd1db873 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -270,6 +270,7 @@ static void handle___pkvm_host_unshare_guest(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
DECLARE_REG(u64, gfn, host_ctxt, 2);
+ DECLARE_REG(u64, nr_pages, host_ctxt, 3);
struct pkvm_hyp_vm *hyp_vm;
int ret = -EINVAL;
@@ -280,7 +281,7 @@ static void handle___pkvm_host_unshare_guest(struct kvm_cpu_context *host_ctxt)
if (!hyp_vm)
goto out;
- ret = __pkvm_host_unshare_guest(gfn, hyp_vm);
+ ret = __pkvm_host_unshare_guest(gfn, nr_pages, hyp_vm);
put_pkvm_hyp_vm(hyp_vm);
out:
cpu_reg(host_ctxt, 1) = ret;
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 8e0847aa090d..884e2316aa48 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -980,10 +980,9 @@ int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu
return ret;
}
-static int __check_host_shared_guest(struct pkvm_hyp_vm *vm, u64 *__phys, u64 ipa)
+static int __check_host_shared_guest(struct pkvm_hyp_vm *vm, u64 *__phys, u64 ipa, u64 size)
{
enum pkvm_page_state state;
- struct hyp_page *page;
kvm_pte_t pte;
u64 phys;
s8 level;
@@ -994,7 +993,7 @@ static int __check_host_shared_guest(struct pkvm_hyp_vm *vm, u64 *__phys, u64 ip
return ret;
if (!kvm_pte_valid(pte))
return -ENOENT;
- if (level != KVM_PGTABLE_LAST_LEVEL)
+ if (kvm_granule_size(level) != size)
return -E2BIG;
state = guest_get_page_state(pte, ipa);
@@ -1002,43 +1001,49 @@ static int __check_host_shared_guest(struct pkvm_hyp_vm *vm, u64 *__phys, u64 ip
return -EPERM;
phys = kvm_pte_to_phys(pte);
- ret = check_range_allowed_memory(phys, phys + PAGE_SIZE);
+ ret = check_range_allowed_memory(phys, phys + size);
if (WARN_ON(ret))
return ret;
- page = hyp_phys_to_page(phys);
- if (get_host_state(page) != PKVM_PAGE_SHARED_OWNED)
- return -EPERM;
- if (WARN_ON(!page->host_share_guest_count))
- return -EINVAL;
+ for_each_hyp_page(page, phys, size) {
+ if (get_host_state(page) != PKVM_PAGE_SHARED_OWNED)
+ return -EPERM;
+ if (WARN_ON(!page->host_share_guest_count))
+ return -EINVAL;
+ }
*__phys = phys;
return 0;
}
-int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *vm)
+int __pkvm_host_unshare_guest(u64 gfn, u64 nr_pages, struct pkvm_hyp_vm *vm)
{
u64 ipa = hyp_pfn_to_phys(gfn);
- struct hyp_page *page;
- u64 phys;
+ u64 size, phys;
int ret;
+ ret = __guest_check_transition_size(0, ipa, nr_pages, &size);
+ if (ret)
+ return ret;
+
host_lock_component();
guest_lock_component(vm);
- ret = __check_host_shared_guest(vm, &phys, ipa);
+ ret = __check_host_shared_guest(vm, &phys, ipa, size);
if (ret)
goto unlock;
- ret = kvm_pgtable_stage2_unmap(&vm->pgt, ipa, PAGE_SIZE);
+ ret = kvm_pgtable_stage2_unmap(&vm->pgt, ipa, size);
if (ret)
goto unlock;
- page = hyp_phys_to_page(phys);
- page->host_share_guest_count--;
- if (!page->host_share_guest_count)
- WARN_ON(__host_set_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_OWNED));
+ for_each_hyp_page(page, phys, size) {
+ /* __check_host_shared_guest() protects against underflow */
+ page->host_share_guest_count--;
+ if (!page->host_share_guest_count)
+ set_host_state(page, PKVM_PAGE_OWNED);
+ }
unlock:
guest_unlock_component(vm);
@@ -1058,7 +1063,7 @@ static void assert_host_shared_guest(struct pkvm_hyp_vm *vm, u64 ipa)
host_lock_component();
guest_lock_component(vm);
- ret = __check_host_shared_guest(vm, &phys, ipa);
+ ret = __check_host_shared_guest(vm, &phys, ipa, PAGE_SIZE);
guest_unlock_component(vm);
host_unlock_component();
@@ -1245,7 +1250,7 @@ void pkvm_ownership_selftest(void *base)
assert_transition_res(-EPERM, __pkvm_host_unshare_ffa, pfn, 1);
assert_transition_res(-EPERM, hyp_pin_shared_mem, virt, virt + size);
assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
- assert_transition_res(-ENOENT, __pkvm_host_unshare_guest, gfn, vm);
+ assert_transition_res(-ENOENT, __pkvm_host_unshare_guest, gfn, 1, vm);
selftest_state.host = PKVM_PAGE_OWNED;
selftest_state.hyp = PKVM_NOPAGE;
@@ -1253,7 +1258,7 @@ void pkvm_ownership_selftest(void *base)
assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1);
assert_transition_res(-EPERM, __pkvm_host_unshare_hyp, pfn);
assert_transition_res(-EPERM, __pkvm_host_unshare_ffa, pfn, 1);
- assert_transition_res(-ENOENT, __pkvm_host_unshare_guest, gfn, vm);
+ assert_transition_res(-ENOENT, __pkvm_host_unshare_guest, gfn, 1, vm);
assert_transition_res(-EPERM, hyp_pin_shared_mem, virt, virt + size);
selftest_state.host = PKVM_PAGE_SHARED_OWNED;
@@ -1264,7 +1269,7 @@ void pkvm_ownership_selftest(void *base)
assert_transition_res(-EPERM, __pkvm_host_share_ffa, pfn, 1);
assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1);
assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
- assert_transition_res(-ENOENT, __pkvm_host_unshare_guest, gfn, vm);
+ assert_transition_res(-ENOENT, __pkvm_host_unshare_guest, gfn, 1, vm);
assert_transition_res(0, hyp_pin_shared_mem, virt, virt + size);
assert_transition_res(0, hyp_pin_shared_mem, virt, virt + size);
@@ -1276,7 +1281,7 @@ void pkvm_ownership_selftest(void *base)
assert_transition_res(-EPERM, __pkvm_host_share_ffa, pfn, 1);
assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1);
assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
- assert_transition_res(-ENOENT, __pkvm_host_unshare_guest, gfn, vm);
+ assert_transition_res(-ENOENT, __pkvm_host_unshare_guest, gfn, 1, vm);
hyp_unpin_shared_mem(virt, virt + size);
assert_page_state();
@@ -1295,7 +1300,7 @@ void pkvm_ownership_selftest(void *base)
assert_transition_res(-EPERM, __pkvm_host_unshare_hyp, pfn);
assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1);
assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
- assert_transition_res(-ENOENT, __pkvm_host_unshare_guest, gfn, vm);
+ assert_transition_res(-ENOENT, __pkvm_host_unshare_guest, gfn, 1, vm);
assert_transition_res(-EPERM, hyp_pin_shared_mem, virt, virt + size);
selftest_state.host = PKVM_PAGE_OWNED;
@@ -1319,11 +1324,11 @@ void pkvm_ownership_selftest(void *base)
WARN_ON(hyp_virt_to_page(virt)->host_share_guest_count != 2);
selftest_state.guest[0] = PKVM_NOPAGE;
- assert_transition_res(0, __pkvm_host_unshare_guest, gfn, vm);
+ assert_transition_res(0, __pkvm_host_unshare_guest, gfn, 1, vm);
selftest_state.guest[1] = PKVM_NOPAGE;
selftest_state.host = PKVM_PAGE_OWNED;
- assert_transition_res(0, __pkvm_host_unshare_guest, gfn + 1, vm);
+ assert_transition_res(0, __pkvm_host_unshare_guest, gfn + 1, 1, vm);
selftest_state.host = PKVM_NOPAGE;
selftest_state.hyp = PKVM_PAGE_OWNED;
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 0285e2cd2e7f..f77c5157a8d7 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -371,7 +371,7 @@ int pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
lockdep_assert_held_write(&kvm->mmu_lock);
for_each_mapping_in_range_safe(pgt, addr, addr + size, mapping) {
- ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_guest, handle, mapping->gfn);
+ ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_guest, handle, mapping->gfn, 1);
if (WARN_ON(ret))
break;
rb_erase(&mapping->node, &pgt->pkvm_mappings);
--
2.49.0.1143.g0be31eac6b-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v5 05/10] KVM: arm64: Add a range to __pkvm_host_wrprotect_guest()
2025-05-20 8:51 [PATCH v5 00/10] Stage-2 huge mappings for pKVM np-guests Vincent Donnefort
` (3 preceding siblings ...)
2025-05-20 8:51 ` [PATCH v5 04/10] KVM: arm64: Add a range to __pkvm_host_unshare_guest() Vincent Donnefort
@ 2025-05-20 8:51 ` Vincent Donnefort
2025-05-20 8:51 ` [PATCH v5 06/10] KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest() Vincent Donnefort
` (4 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Vincent Donnefort @ 2025-05-20 8:51 UTC (permalink / raw)
To: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will
Cc: qperret, linux-arm-kernel, kvmarm, linux-kernel, kernel-team,
Vincent Donnefort
In preparation for supporting stage-2 huge mappings for np-guest. Add a
nr_pages argument to the __pkvm_host_wrprotect_guest hypercall. This
range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512
on a 4K-pages system).
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 19671edbe18f..64d4f3bf6269 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -43,8 +43,8 @@ int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu
enum kvm_pgtable_prot prot);
int __pkvm_host_unshare_guest(u64 gfn, u64 nr_pages, struct pkvm_hyp_vm *hyp_vm);
int __pkvm_host_relax_perms_guest(u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum kvm_pgtable_prot prot);
-int __pkvm_host_wrprotect_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
int __pkvm_host_test_clear_young_guest(u64 gfn, bool mkold, struct pkvm_hyp_vm *vm);
+int __pkvm_host_wrprotect_guest(u64 gfn, u64 nr_pages, struct pkvm_hyp_vm *hyp_vm);
int __pkvm_host_mkyoung_guest(u64 gfn, struct pkvm_hyp_vcpu *vcpu);
bool addr_is_memory(phys_addr_t phys);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 5c03bd1db873..fa7e2421d359 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -310,6 +310,7 @@ static void handle___pkvm_host_wrprotect_guest(struct kvm_cpu_context *host_ctxt
{
DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
DECLARE_REG(u64, gfn, host_ctxt, 2);
+ DECLARE_REG(u64, nr_pages, host_ctxt, 3);
struct pkvm_hyp_vm *hyp_vm;
int ret = -EINVAL;
@@ -320,7 +321,7 @@ static void handle___pkvm_host_wrprotect_guest(struct kvm_cpu_context *host_ctxt
if (!hyp_vm)
goto out;
- ret = __pkvm_host_wrprotect_guest(gfn, hyp_vm);
+ ret = __pkvm_host_wrprotect_guest(gfn, nr_pages, hyp_vm);
put_pkvm_hyp_vm(hyp_vm);
out:
cpu_reg(host_ctxt, 1) = ret;
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 884e2316aa48..a6c45202aa85 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1052,7 +1052,7 @@ int __pkvm_host_unshare_guest(u64 gfn, u64 nr_pages, struct pkvm_hyp_vm *vm)
return ret;
}
-static void assert_host_shared_guest(struct pkvm_hyp_vm *vm, u64 ipa)
+static void assert_host_shared_guest(struct pkvm_hyp_vm *vm, u64 ipa, u64 size)
{
u64 phys;
int ret;
@@ -1063,7 +1063,7 @@ static void assert_host_shared_guest(struct pkvm_hyp_vm *vm, u64 ipa)
host_lock_component();
guest_lock_component(vm);
- ret = __check_host_shared_guest(vm, &phys, ipa, PAGE_SIZE);
+ ret = __check_host_shared_guest(vm, &phys, ipa, size);
guest_unlock_component(vm);
host_unlock_component();
@@ -1083,7 +1083,7 @@ int __pkvm_host_relax_perms_guest(u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum kvm_
if (prot & ~KVM_PGTABLE_PROT_RWX)
return -EINVAL;
- assert_host_shared_guest(vm, ipa);
+ assert_host_shared_guest(vm, ipa, PAGE_SIZE);
guest_lock_component(vm);
ret = kvm_pgtable_stage2_relax_perms(&vm->pgt, ipa, prot, 0);
guest_unlock_component(vm);
@@ -1091,17 +1091,21 @@ int __pkvm_host_relax_perms_guest(u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum kvm_
return ret;
}
-int __pkvm_host_wrprotect_guest(u64 gfn, struct pkvm_hyp_vm *vm)
+int __pkvm_host_wrprotect_guest(u64 gfn, u64 nr_pages, struct pkvm_hyp_vm *vm)
{
- u64 ipa = hyp_pfn_to_phys(gfn);
+ u64 size, ipa = hyp_pfn_to_phys(gfn);
int ret;
if (pkvm_hyp_vm_is_protected(vm))
return -EPERM;
- assert_host_shared_guest(vm, ipa);
+ ret = __guest_check_transition_size(0, ipa, nr_pages, &size);
+ if (ret)
+ return ret;
+
+ assert_host_shared_guest(vm, ipa, size);
guest_lock_component(vm);
- ret = kvm_pgtable_stage2_wrprotect(&vm->pgt, ipa, PAGE_SIZE);
+ ret = kvm_pgtable_stage2_wrprotect(&vm->pgt, ipa, size);
guest_unlock_component(vm);
return ret;
@@ -1115,7 +1119,7 @@ int __pkvm_host_test_clear_young_guest(u64 gfn, bool mkold, struct pkvm_hyp_vm *
if (pkvm_hyp_vm_is_protected(vm))
return -EPERM;
- assert_host_shared_guest(vm, ipa);
+ assert_host_shared_guest(vm, ipa, PAGE_SIZE);
guest_lock_component(vm);
ret = kvm_pgtable_stage2_test_clear_young(&vm->pgt, ipa, PAGE_SIZE, mkold);
guest_unlock_component(vm);
@@ -1131,7 +1135,7 @@ int __pkvm_host_mkyoung_guest(u64 gfn, struct pkvm_hyp_vcpu *vcpu)
if (pkvm_hyp_vm_is_protected(vm))
return -EPERM;
- assert_host_shared_guest(vm, ipa);
+ assert_host_shared_guest(vm, ipa, PAGE_SIZE);
guest_lock_component(vm);
kvm_pgtable_stage2_mkyoung(&vm->pgt, ipa, 0);
guest_unlock_component(vm);
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index f77c5157a8d7..daab4a00790a 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -390,7 +390,7 @@ int pkvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
lockdep_assert_held(&kvm->mmu_lock);
for_each_mapping_in_range_safe(pgt, addr, addr + size, mapping) {
- ret = kvm_call_hyp_nvhe(__pkvm_host_wrprotect_guest, handle, mapping->gfn);
+ ret = kvm_call_hyp_nvhe(__pkvm_host_wrprotect_guest, handle, mapping->gfn, 1);
if (WARN_ON(ret))
break;
}
--
2.49.0.1143.g0be31eac6b-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v5 06/10] KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest()
2025-05-20 8:51 [PATCH v5 00/10] Stage-2 huge mappings for pKVM np-guests Vincent Donnefort
` (4 preceding siblings ...)
2025-05-20 8:51 ` [PATCH v5 05/10] KVM: arm64: Add a range to __pkvm_host_wrprotect_guest() Vincent Donnefort
@ 2025-05-20 8:51 ` Vincent Donnefort
2025-05-20 8:51 ` [PATCH v5 07/10] KVM: arm64: Convert pkvm_mappings to interval tree Vincent Donnefort
` (3 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Vincent Donnefort @ 2025-05-20 8:51 UTC (permalink / raw)
To: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will
Cc: qperret, linux-arm-kernel, kvmarm, linux-kernel, kernel-team,
Vincent Donnefort
In preparation for supporting stage-2 huge mappings for np-guest. Add a
nr_pages argument to the __pkvm_host_test_clear_young_guest hypercall.
This range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is
512 on a 4K-pages system).
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 64d4f3bf6269..5f9d56754e39 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -43,8 +43,8 @@ int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu
enum kvm_pgtable_prot prot);
int __pkvm_host_unshare_guest(u64 gfn, u64 nr_pages, struct pkvm_hyp_vm *hyp_vm);
int __pkvm_host_relax_perms_guest(u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum kvm_pgtable_prot prot);
-int __pkvm_host_test_clear_young_guest(u64 gfn, bool mkold, struct pkvm_hyp_vm *vm);
int __pkvm_host_wrprotect_guest(u64 gfn, u64 nr_pages, struct pkvm_hyp_vm *hyp_vm);
+int __pkvm_host_test_clear_young_guest(u64 gfn, u64 nr_pages, bool mkold, struct pkvm_hyp_vm *vm);
int __pkvm_host_mkyoung_guest(u64 gfn, struct pkvm_hyp_vcpu *vcpu);
bool addr_is_memory(phys_addr_t phys);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index fa7e2421d359..8e8848de4d47 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -331,7 +331,8 @@ static void handle___pkvm_host_test_clear_young_guest(struct kvm_cpu_context *ho
{
DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
DECLARE_REG(u64, gfn, host_ctxt, 2);
- DECLARE_REG(bool, mkold, host_ctxt, 3);
+ DECLARE_REG(u64, nr_pages, host_ctxt, 3);
+ DECLARE_REG(bool, mkold, host_ctxt, 4);
struct pkvm_hyp_vm *hyp_vm;
int ret = -EINVAL;
@@ -342,7 +343,7 @@ static void handle___pkvm_host_test_clear_young_guest(struct kvm_cpu_context *ho
if (!hyp_vm)
goto out;
- ret = __pkvm_host_test_clear_young_guest(gfn, mkold, hyp_vm);
+ ret = __pkvm_host_test_clear_young_guest(gfn, nr_pages, mkold, hyp_vm);
put_pkvm_hyp_vm(hyp_vm);
out:
cpu_reg(host_ctxt, 1) = ret;
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index a6c45202aa85..5a7a38c5d67c 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1111,17 +1111,21 @@ int __pkvm_host_wrprotect_guest(u64 gfn, u64 nr_pages, struct pkvm_hyp_vm *vm)
return ret;
}
-int __pkvm_host_test_clear_young_guest(u64 gfn, bool mkold, struct pkvm_hyp_vm *vm)
+int __pkvm_host_test_clear_young_guest(u64 gfn, u64 nr_pages, bool mkold, struct pkvm_hyp_vm *vm)
{
- u64 ipa = hyp_pfn_to_phys(gfn);
+ u64 size, ipa = hyp_pfn_to_phys(gfn);
int ret;
if (pkvm_hyp_vm_is_protected(vm))
return -EPERM;
- assert_host_shared_guest(vm, ipa, PAGE_SIZE);
+ ret = __guest_check_transition_size(0, ipa, nr_pages, &size);
+ if (ret)
+ return ret;
+
+ assert_host_shared_guest(vm, ipa, size);
guest_lock_component(vm);
- ret = kvm_pgtable_stage2_test_clear_young(&vm->pgt, ipa, PAGE_SIZE, mkold);
+ ret = kvm_pgtable_stage2_test_clear_young(&vm->pgt, ipa, size, mkold);
guest_unlock_component(vm);
return ret;
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index daab4a00790a..057874bbe3e1 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -420,7 +420,7 @@ bool pkvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr, u64
lockdep_assert_held(&kvm->mmu_lock);
for_each_mapping_in_range_safe(pgt, addr, addr + size, mapping)
young |= kvm_call_hyp_nvhe(__pkvm_host_test_clear_young_guest, handle, mapping->gfn,
- mkold);
+ 1, mkold);
return young;
}
--
2.49.0.1143.g0be31eac6b-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v5 07/10] KVM: arm64: Convert pkvm_mappings to interval tree
2025-05-20 8:51 [PATCH v5 00/10] Stage-2 huge mappings for pKVM np-guests Vincent Donnefort
` (5 preceding siblings ...)
2025-05-20 8:51 ` [PATCH v5 06/10] KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest() Vincent Donnefort
@ 2025-05-20 8:51 ` Vincent Donnefort
2025-05-20 8:51 ` [PATCH v5 08/10] KVM: arm64: Add a range to pkvm_mappings Vincent Donnefort
` (2 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Vincent Donnefort @ 2025-05-20 8:51 UTC (permalink / raw)
To: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will
Cc: qperret, linux-arm-kernel, kvmarm, linux-kernel, kernel-team,
Vincent Donnefort
From: Quentin Perret <qperret@google.com>
In preparation for supporting stage-2 huge mappings for np-guest, let's
convert pgt.pkvm_mappings to an interval tree.
No functional change intended.
Suggested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 6b9d274052c7..1b43bcd2a679 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -413,7 +413,7 @@ static inline bool kvm_pgtable_walk_lock_held(void)
*/
struct kvm_pgtable {
union {
- struct rb_root pkvm_mappings;
+ struct rb_root_cached pkvm_mappings;
struct {
u32 ia_bits;
s8 start_level;
diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index d91bfcf2db56..da75d41c948c 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -173,6 +173,7 @@ struct pkvm_mapping {
struct rb_node node;
u64 gfn;
u64 pfn;
+ u64 __subtree_last; /* Internal member for interval tree */
};
int pkvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 057874bbe3e1..8a1a2faf66a8 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -5,6 +5,7 @@
*/
#include <linux/init.h>
+#include <linux/interval_tree_generic.h>
#include <linux/kmemleak.h>
#include <linux/kvm_host.h>
#include <asm/kvm_mmu.h>
@@ -256,80 +257,67 @@ static int __init finalize_pkvm(void)
}
device_initcall_sync(finalize_pkvm);
-static int cmp_mappings(struct rb_node *node, const struct rb_node *parent)
+static u64 __pkvm_mapping_start(struct pkvm_mapping *m)
{
- struct pkvm_mapping *a = rb_entry(node, struct pkvm_mapping, node);
- struct pkvm_mapping *b = rb_entry(parent, struct pkvm_mapping, node);
-
- if (a->gfn < b->gfn)
- return -1;
- if (a->gfn > b->gfn)
- return 1;
- return 0;
+ return m->gfn * PAGE_SIZE;
}
-static struct rb_node *find_first_mapping_node(struct rb_root *root, u64 gfn)
+static u64 __pkvm_mapping_end(struct pkvm_mapping *m)
{
- struct rb_node *node = root->rb_node, *prev = NULL;
- struct pkvm_mapping *mapping;
-
- while (node) {
- mapping = rb_entry(node, struct pkvm_mapping, node);
- if (mapping->gfn == gfn)
- return node;
- prev = node;
- node = (gfn < mapping->gfn) ? node->rb_left : node->rb_right;
- }
-
- return prev;
+ return (m->gfn + 1) * PAGE_SIZE - 1;
}
+INTERVAL_TREE_DEFINE(struct pkvm_mapping, node, u64, __subtree_last,
+ __pkvm_mapping_start, __pkvm_mapping_end, static,
+ pkvm_mapping);
+
/*
- * __tmp is updated to rb_next(__tmp) *before* entering the body of the loop to allow freeing
- * of __map inline.
+ * __tmp is updated to iter_first(pkvm_mappings) *before* entering the body of the loop to allow
+ * freeing of __map inline.
*/
#define for_each_mapping_in_range_safe(__pgt, __start, __end, __map) \
- for (struct rb_node *__tmp = find_first_mapping_node(&(__pgt)->pkvm_mappings, \
- ((__start) >> PAGE_SHIFT)); \
+ for (struct pkvm_mapping *__tmp = pkvm_mapping_iter_first(&(__pgt)->pkvm_mappings, \
+ __start, __end - 1); \
__tmp && ({ \
- __map = rb_entry(__tmp, struct pkvm_mapping, node); \
- __tmp = rb_next(__tmp); \
+ __map = __tmp; \
+ __tmp = pkvm_mapping_iter_next(__map, __start, __end - 1); \
true; \
}); \
- ) \
- if (__map->gfn < ((__start) >> PAGE_SHIFT)) \
- continue; \
- else if (__map->gfn >= ((__end) >> PAGE_SHIFT)) \
- break; \
- else
+ )
int pkvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
struct kvm_pgtable_mm_ops *mm_ops)
{
- pgt->pkvm_mappings = RB_ROOT;
+ pgt->pkvm_mappings = RB_ROOT_CACHED;
pgt->mmu = mmu;
return 0;
}
-void pkvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
+static int __pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 start, u64 end)
{
struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu);
pkvm_handle_t handle = kvm->arch.pkvm.handle;
struct pkvm_mapping *mapping;
- struct rb_node *node;
+ int ret;
if (!handle)
- return;
+ return 0;
- node = rb_first(&pgt->pkvm_mappings);
- while (node) {
- mapping = rb_entry(node, struct pkvm_mapping, node);
- kvm_call_hyp_nvhe(__pkvm_host_unshare_guest, handle, mapping->gfn);
- node = rb_next(node);
- rb_erase(&mapping->node, &pgt->pkvm_mappings);
+ for_each_mapping_in_range_safe(pgt, start, end, mapping) {
+ ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_guest, handle, mapping->gfn, 1);
+ if (WARN_ON(ret))
+ return ret;
+ pkvm_mapping_remove(mapping, &pgt->pkvm_mappings);
kfree(mapping);
}
+
+ return 0;
+}
+
+void pkvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
+{
+ __pkvm_pgtable_stage2_unmap(pgt, 0, ~(0ULL));
}
int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
@@ -357,28 +345,16 @@ int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
swap(mapping, cache->mapping);
mapping->gfn = gfn;
mapping->pfn = pfn;
- WARN_ON(rb_find_add(&mapping->node, &pgt->pkvm_mappings, cmp_mappings));
+ pkvm_mapping_insert(mapping, &pgt->pkvm_mappings);
return ret;
}
int pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
{
- struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu);
- pkvm_handle_t handle = kvm->arch.pkvm.handle;
- struct pkvm_mapping *mapping;
- int ret = 0;
-
- lockdep_assert_held_write(&kvm->mmu_lock);
- for_each_mapping_in_range_safe(pgt, addr, addr + size, mapping) {
- ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_guest, handle, mapping->gfn, 1);
- if (WARN_ON(ret))
- break;
- rb_erase(&mapping->node, &pgt->pkvm_mappings);
- kfree(mapping);
- }
+ lockdep_assert_held_write(&kvm_s2_mmu_to_kvm(pgt->mmu)->mmu_lock);
- return ret;
+ return __pkvm_pgtable_stage2_unmap(pgt, addr, addr + size);
}
int pkvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
--
2.49.0.1143.g0be31eac6b-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v5 08/10] KVM: arm64: Add a range to pkvm_mappings
2025-05-20 8:51 [PATCH v5 00/10] Stage-2 huge mappings for pKVM np-guests Vincent Donnefort
` (6 preceding siblings ...)
2025-05-20 8:51 ` [PATCH v5 07/10] KVM: arm64: Convert pkvm_mappings to interval tree Vincent Donnefort
@ 2025-05-20 8:51 ` Vincent Donnefort
2025-05-20 8:52 ` [PATCH v5 09/10] KVM: arm64: Stage-2 huge mappings for np-guests Vincent Donnefort
2025-05-20 8:52 ` [PATCH v5 10/10] KVM: arm64: np-guest CMOs with PMD_SIZE fixmap Vincent Donnefort
9 siblings, 0 replies; 14+ messages in thread
From: Vincent Donnefort @ 2025-05-20 8:51 UTC (permalink / raw)
To: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will
Cc: qperret, linux-arm-kernel, kvmarm, linux-kernel, kernel-team,
Vincent Donnefort
From: Quentin Perret <qperret@google.com>
In preparation for supporting stage-2 huge mappings for np-guest, add a
nr_pages member for pkvm_mappings to allow EL1 to track the size of the
stage-2 mapping.
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index da75d41c948c..ea58282f59bb 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -173,6 +173,7 @@ struct pkvm_mapping {
struct rb_node node;
u64 gfn;
u64 pfn;
+ u64 nr_pages;
u64 __subtree_last; /* Internal member for interval tree */
};
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 8a1a2faf66a8..b1a65f50c02a 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -264,7 +264,7 @@ static u64 __pkvm_mapping_start(struct pkvm_mapping *m)
static u64 __pkvm_mapping_end(struct pkvm_mapping *m)
{
- return (m->gfn + 1) * PAGE_SIZE - 1;
+ return (m->gfn + m->nr_pages) * PAGE_SIZE - 1;
}
INTERVAL_TREE_DEFINE(struct pkvm_mapping, node, u64, __subtree_last,
@@ -305,7 +305,8 @@ static int __pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 start, u64 e
return 0;
for_each_mapping_in_range_safe(pgt, start, end, mapping) {
- ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_guest, handle, mapping->gfn, 1);
+ ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_guest, handle, mapping->gfn,
+ mapping->nr_pages);
if (WARN_ON(ret))
return ret;
pkvm_mapping_remove(mapping, &pgt->pkvm_mappings);
@@ -335,16 +336,32 @@ int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
return -EINVAL;
lockdep_assert_held_write(&kvm->mmu_lock);
- ret = kvm_call_hyp_nvhe(__pkvm_host_share_guest, pfn, gfn, 1, prot);
- if (ret) {
- /* Is the gfn already mapped due to a racing vCPU? */
- if (ret == -EPERM)
+
+ /*
+ * Calling stage2_map() on top of existing mappings is either happening because of a race
+ * with another vCPU, or because we're changing between page and block mappings. As per
+ * user_mem_abort(), same-size permission faults are handled in the relax_perms() path.
+ */
+ mapping = pkvm_mapping_iter_first(&pgt->pkvm_mappings, addr, addr + size - 1);
+ if (mapping) {
+ if (size == (mapping->nr_pages * PAGE_SIZE))
return -EAGAIN;
+
+ /* Remove _any_ pkvm_mapping overlapping with the range, bigger or smaller. */
+ ret = __pkvm_pgtable_stage2_unmap(pgt, addr, addr + size);
+ if (ret)
+ return ret;
+ mapping = NULL;
}
+ ret = kvm_call_hyp_nvhe(__pkvm_host_share_guest, pfn, gfn, size / PAGE_SIZE, prot);
+ if (WARN_ON(ret))
+ return ret;
+
swap(mapping, cache->mapping);
mapping->gfn = gfn;
mapping->pfn = pfn;
+ mapping->nr_pages = size / PAGE_SIZE;
pkvm_mapping_insert(mapping, &pgt->pkvm_mappings);
return ret;
@@ -366,7 +383,8 @@ int pkvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
lockdep_assert_held(&kvm->mmu_lock);
for_each_mapping_in_range_safe(pgt, addr, addr + size, mapping) {
- ret = kvm_call_hyp_nvhe(__pkvm_host_wrprotect_guest, handle, mapping->gfn, 1);
+ ret = kvm_call_hyp_nvhe(__pkvm_host_wrprotect_guest, handle, mapping->gfn,
+ mapping->nr_pages);
if (WARN_ON(ret))
break;
}
@@ -381,7 +399,8 @@ int pkvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
lockdep_assert_held(&kvm->mmu_lock);
for_each_mapping_in_range_safe(pgt, addr, addr + size, mapping)
- __clean_dcache_guest_page(pfn_to_kaddr(mapping->pfn), PAGE_SIZE);
+ __clean_dcache_guest_page(pfn_to_kaddr(mapping->pfn),
+ PAGE_SIZE * mapping->nr_pages);
return 0;
}
@@ -396,7 +415,7 @@ bool pkvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr, u64
lockdep_assert_held(&kvm->mmu_lock);
for_each_mapping_in_range_safe(pgt, addr, addr + size, mapping)
young |= kvm_call_hyp_nvhe(__pkvm_host_test_clear_young_guest, handle, mapping->gfn,
- 1, mkold);
+ mapping->nr_pages, mkold);
return young;
}
--
2.49.0.1143.g0be31eac6b-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v5 09/10] KVM: arm64: Stage-2 huge mappings for np-guests
2025-05-20 8:51 [PATCH v5 00/10] Stage-2 huge mappings for pKVM np-guests Vincent Donnefort
` (7 preceding siblings ...)
2025-05-20 8:51 ` [PATCH v5 08/10] KVM: arm64: Add a range to pkvm_mappings Vincent Donnefort
@ 2025-05-20 8:52 ` Vincent Donnefort
2025-05-20 8:52 ` [PATCH v5 10/10] KVM: arm64: np-guest CMOs with PMD_SIZE fixmap Vincent Donnefort
9 siblings, 0 replies; 14+ messages in thread
From: Vincent Donnefort @ 2025-05-20 8:52 UTC (permalink / raw)
To: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will
Cc: qperret, linux-arm-kernel, kvmarm, linux-kernel, kernel-team,
Vincent Donnefort
Now np-guests hypercalls with range are supported, we can let the
hypervisor to install block mappings whenever the Stage-1 allows it,
that is when backed by either Hugetlbfs or THPs. The size of those block
mappings is limited to PMD_SIZE.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 5a7a38c5d67c..1490820b9ebe 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -166,12 +166,6 @@ int kvm_host_prepare_stage2(void *pgt_pool_base)
return 0;
}
-static bool guest_stage2_force_pte_cb(u64 addr, u64 end,
- enum kvm_pgtable_prot prot)
-{
- return true;
-}
-
static void *guest_s2_zalloc_pages_exact(size_t size)
{
void *addr = hyp_alloc_pages(¤t_vm->pool, get_order(size));
@@ -278,8 +272,7 @@ int kvm_guest_prepare_stage2(struct pkvm_hyp_vm *vm, void *pgd)
};
guest_lock_component(vm);
- ret = __kvm_pgtable_stage2_init(mmu->pgt, mmu, &vm->mm_ops, 0,
- guest_stage2_force_pte_cb);
+ ret = __kvm_pgtable_stage2_init(mmu->pgt, mmu, &vm->mm_ops, 0, NULL);
guest_unlock_component(vm);
if (ret)
return ret;
@@ -908,12 +901,24 @@ int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages)
static int __guest_check_transition_size(u64 phys, u64 ipa, u64 nr_pages, u64 *size)
{
+ size_t block_size;
+
if (nr_pages == 1) {
*size = PAGE_SIZE;
return 0;
}
- return -EINVAL;
+ /* We solely support second to last level huge mapping */
+ block_size = kvm_granule_size(KVM_PGTABLE_LAST_LEVEL - 1);
+
+ if (nr_pages != block_size >> PAGE_SHIFT)
+ return -EINVAL;
+
+ if (!IS_ALIGNED(phys | ipa, block_size))
+ return -EINVAL;
+
+ *size = block_size;
+ return 0;
}
int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu,
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 754f2fe0cc67..e445db2cb4a4 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1304,6 +1304,10 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
if (map_size == PAGE_SIZE)
return true;
+ /* pKVM only supports PMD_SIZE huge-mappings */
+ if (is_protected_kvm_enabled() && map_size != PMD_SIZE)
+ return false;
+
size = memslot->npages * PAGE_SIZE;
gpa_start = memslot->base_gfn << PAGE_SHIFT;
@@ -1537,7 +1541,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
* logging_active is guaranteed to never be true for VM_PFNMAP
* memslots.
*/
- if (logging_active || is_protected_kvm_enabled()) {
+ if (logging_active) {
force_pte = true;
vma_shift = PAGE_SHIFT;
} else {
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index b1a65f50c02a..fcd70bfe44fb 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -332,7 +332,7 @@ int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
u64 pfn = phys >> PAGE_SHIFT;
int ret;
- if (size != PAGE_SIZE)
+ if (size != PAGE_SIZE && size != PMD_SIZE)
return -EINVAL;
lockdep_assert_held_write(&kvm->mmu_lock);
--
2.49.0.1143.g0be31eac6b-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v5 10/10] KVM: arm64: np-guest CMOs with PMD_SIZE fixmap
2025-05-20 8:51 [PATCH v5 00/10] Stage-2 huge mappings for pKVM np-guests Vincent Donnefort
` (8 preceding siblings ...)
2025-05-20 8:52 ` [PATCH v5 09/10] KVM: arm64: Stage-2 huge mappings for np-guests Vincent Donnefort
@ 2025-05-20 8:52 ` Vincent Donnefort
2025-05-21 11:01 ` Marc Zyngier
9 siblings, 1 reply; 14+ messages in thread
From: Vincent Donnefort @ 2025-05-20 8:52 UTC (permalink / raw)
To: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will
Cc: qperret, linux-arm-kernel, kvmarm, linux-kernel, kernel-team,
Vincent Donnefort
With the introduction of stage-2 huge mappings in the pKVM hypervisor,
guest pages CMO is needed for PMD_SIZE size. Fixmap only supports
PAGE_SIZE and iterating over the huge-page is time consuming (mostly due
to TLBI on hyp_fixmap_unmap) which is a problem for EL2 latency.
Introduce a shared PMD_SIZE fixmap (hyp_fixblock_map/hyp_fixblock_unmap)
to improve guest page CMOs when stage-2 huge mappings are installed.
On a Pixel6, the iterative solution resulted in a latency of ~700us,
while the PMD_SIZE fixmap reduces it to ~100us.
Because of the horrendous private range allocation that would be
necessary, this is disabled for 64KiB pages systems.
Suggested-by: Quentin Perret <qperret@google.com>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 1b43bcd2a679..2888b5d03757 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -59,6 +59,11 @@ typedef u64 kvm_pte_t;
#define KVM_PHYS_INVALID (-1ULL)
+#define KVM_PTE_TYPE BIT(1)
+#define KVM_PTE_TYPE_BLOCK 0
+#define KVM_PTE_TYPE_PAGE 1
+#define KVM_PTE_TYPE_TABLE 1
+
#define KVM_PTE_LEAF_ATTR_LO GENMASK(11, 2)
#define KVM_PTE_LEAF_ATTR_LO_S1_ATTRIDX GENMASK(4, 2)
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index 230e4f2527de..6e83ce35c2f2 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -13,9 +13,11 @@
extern struct kvm_pgtable pkvm_pgtable;
extern hyp_spinlock_t pkvm_pgd_lock;
-int hyp_create_pcpu_fixmap(void);
+int hyp_create_fixmap(void);
void *hyp_fixmap_map(phys_addr_t phys);
void hyp_fixmap_unmap(void);
+void *hyp_fixblock_map(phys_addr_t phys, size_t *size);
+void hyp_fixblock_unmap(void);
int hyp_create_idmap(u32 hyp_va_bits);
int hyp_map_vectors(void);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 1490820b9ebe..962948534179 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -216,34 +216,42 @@ static void guest_s2_put_page(void *addr)
hyp_put_page(¤t_vm->pool, addr);
}
-static void clean_dcache_guest_page(void *va, size_t size)
+static void __apply_guest_page(void *va, size_t size,
+ void (*func)(void *addr, size_t size))
{
size += va - PTR_ALIGN_DOWN(va, PAGE_SIZE);
va = PTR_ALIGN_DOWN(va, PAGE_SIZE);
size = PAGE_ALIGN(size);
while (size) {
- __clean_dcache_guest_page(hyp_fixmap_map(__hyp_pa(va)),
- PAGE_SIZE);
- hyp_fixmap_unmap();
- va += PAGE_SIZE;
- size -= PAGE_SIZE;
+ size_t map_size = PAGE_SIZE;
+ void *map;
+
+ if (size >= PMD_SIZE)
+ map = hyp_fixblock_map(__hyp_pa(va), &map_size);
+ else
+ map = hyp_fixmap_map(__hyp_pa(va));
+
+ func(map, map_size);
+
+ if (size >= PMD_SIZE)
+ hyp_fixblock_unmap();
+ else
+ hyp_fixmap_unmap();
+
+ size -= map_size;
+ va += map_size;
}
}
-static void invalidate_icache_guest_page(void *va, size_t size)
+static void clean_dcache_guest_page(void *va, size_t size)
{
- size += va - PTR_ALIGN_DOWN(va, PAGE_SIZE);
- va = PTR_ALIGN_DOWN(va, PAGE_SIZE);
- size = PAGE_ALIGN(size);
+ __apply_guest_page(va, size, __clean_dcache_guest_page);
+}
- while (size) {
- __invalidate_icache_guest_page(hyp_fixmap_map(__hyp_pa(va)),
- PAGE_SIZE);
- hyp_fixmap_unmap();
- va += PAGE_SIZE;
- size -= PAGE_SIZE;
- }
+static void invalidate_icache_guest_page(void *va, size_t size)
+{
+ __apply_guest_page(va, size, __invalidate_icache_guest_page);
}
int kvm_guest_prepare_stage2(struct pkvm_hyp_vm *vm, void *pgd)
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index f41c7440b34b..ae8391baebc3 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -229,9 +229,8 @@ int hyp_map_vectors(void)
return 0;
}
-void *hyp_fixmap_map(phys_addr_t phys)
+static void *fixmap_map_slot(struct hyp_fixmap_slot *slot, phys_addr_t phys)
{
- struct hyp_fixmap_slot *slot = this_cpu_ptr(&fixmap_slots);
kvm_pte_t pte, *ptep = slot->ptep;
pte = *ptep;
@@ -243,10 +242,21 @@ void *hyp_fixmap_map(phys_addr_t phys)
return (void *)slot->addr;
}
+void *hyp_fixmap_map(phys_addr_t phys)
+{
+ return fixmap_map_slot(this_cpu_ptr(&fixmap_slots), phys);
+}
+
static void fixmap_clear_slot(struct hyp_fixmap_slot *slot)
{
kvm_pte_t *ptep = slot->ptep;
u64 addr = slot->addr;
+ u32 level;
+
+ if (FIELD_GET(KVM_PTE_TYPE, *ptep) == KVM_PTE_TYPE_PAGE)
+ level = KVM_PGTABLE_LAST_LEVEL;
+ else
+ level = KVM_PGTABLE_LAST_LEVEL - 1; /* create_fixblock() guarantees PMD level */
WRITE_ONCE(*ptep, *ptep & ~KVM_PTE_VALID);
@@ -260,7 +270,7 @@ static void fixmap_clear_slot(struct hyp_fixmap_slot *slot)
* https://lore.kernel.org/kvm/20221017115209.2099-1-will@kernel.org/T/#mf10dfbaf1eaef9274c581b81c53758918c1d0f03
*/
dsb(ishst);
- __tlbi_level(vale2is, __TLBI_VADDR(addr, 0), KVM_PGTABLE_LAST_LEVEL);
+ __tlbi_level(vale2is, __TLBI_VADDR(addr, 0), level);
dsb(ish);
isb();
}
@@ -273,9 +283,9 @@ void hyp_fixmap_unmap(void)
static int __create_fixmap_slot_cb(const struct kvm_pgtable_visit_ctx *ctx,
enum kvm_pgtable_walk_flags visit)
{
- struct hyp_fixmap_slot *slot = per_cpu_ptr(&fixmap_slots, (u64)ctx->arg);
+ struct hyp_fixmap_slot *slot = (struct hyp_fixmap_slot *)ctx->arg;
- if (!kvm_pte_valid(ctx->old) || ctx->level != KVM_PGTABLE_LAST_LEVEL)
+ if (!kvm_pte_valid(ctx->old) || (ctx->end - ctx->start) != kvm_granule_size(ctx->level))
return -EINVAL;
slot->addr = ctx->addr;
@@ -296,13 +306,84 @@ static int create_fixmap_slot(u64 addr, u64 cpu)
struct kvm_pgtable_walker walker = {
.cb = __create_fixmap_slot_cb,
.flags = KVM_PGTABLE_WALK_LEAF,
- .arg = (void *)cpu,
+ .arg = per_cpu_ptr(&fixmap_slots, cpu),
};
return kvm_pgtable_walk(&pkvm_pgtable, addr, PAGE_SIZE, &walker);
}
-int hyp_create_pcpu_fixmap(void)
+#if PAGE_SHIFT < 16
+#define HAS_FIXBLOCK
+static struct hyp_fixmap_slot hyp_fixblock_slot;
+static DEFINE_HYP_SPINLOCK(hyp_fixblock_lock);
+#endif
+
+static int create_fixblock(void)
+{
+#ifdef HAS_FIXBLOCK
+ struct kvm_pgtable_walker walker = {
+ .cb = __create_fixmap_slot_cb,
+ .flags = KVM_PGTABLE_WALK_LEAF,
+ .arg = &hyp_fixblock_slot,
+ };
+ unsigned long addr;
+ phys_addr_t phys;
+ int ret, i;
+
+ /* Find a RAM phys address, PMD aligned */
+ for (i = 0; i < hyp_memblock_nr; i++) {
+ phys = ALIGN(hyp_memory[i].base, PMD_SIZE);
+ if (phys + PMD_SIZE < (hyp_memory[i].base + hyp_memory[i].size))
+ break;
+ }
+
+ if (i >= hyp_memblock_nr)
+ return -EINVAL;
+
+ hyp_spin_lock(&pkvm_pgd_lock);
+ addr = ALIGN(__io_map_base, PMD_SIZE);
+ ret = __pkvm_alloc_private_va_range(addr, PMD_SIZE);
+ if (ret)
+ goto unlock;
+
+ ret = kvm_pgtable_hyp_map(&pkvm_pgtable, addr, PMD_SIZE, phys, PAGE_HYP);
+ if (ret)
+ goto unlock;
+
+ ret = kvm_pgtable_walk(&pkvm_pgtable, addr, PMD_SIZE, &walker);
+
+unlock:
+ hyp_spin_unlock(&pkvm_pgd_lock);
+
+ return ret;
+#else
+ return 0;
+#endif
+}
+
+void *hyp_fixblock_map(phys_addr_t phys, size_t *size)
+{
+#ifdef HAS_FIXBLOCK
+ *size = PMD_SIZE;
+ hyp_spin_lock(&hyp_fixblock_lock);
+ return fixmap_map_slot(&hyp_fixblock_slot, phys);
+#else
+ *size = PAGE_SIZE;
+ return hyp_fixmap_map(phys);
+#endif
+}
+
+void hyp_fixblock_unmap(void)
+{
+#ifdef HAS_FIXBLOCK
+ fixmap_clear_slot(&hyp_fixblock_slot);
+ hyp_spin_unlock(&hyp_fixblock_lock);
+#else
+ hyp_fixmap_unmap();
+#endif
+}
+
+int hyp_create_fixmap(void)
{
unsigned long addr, i;
int ret;
@@ -322,7 +403,7 @@ int hyp_create_pcpu_fixmap(void)
return ret;
}
- return 0;
+ return create_fixblock();
}
int hyp_create_idmap(u32 hyp_va_bits)
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index c19860fc8183..a48d3f5a5afb 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -312,7 +312,7 @@ void __noreturn __pkvm_init_finalise(void)
if (ret)
goto out;
- ret = hyp_create_pcpu_fixmap();
+ ret = hyp_create_fixmap();
if (ret)
goto out;
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index df5cc74a7dd0..c351b4abd5db 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -11,12 +11,6 @@
#include <asm/kvm_pgtable.h>
#include <asm/stage2_pgtable.h>
-
-#define KVM_PTE_TYPE BIT(1)
-#define KVM_PTE_TYPE_BLOCK 0
-#define KVM_PTE_TYPE_PAGE 1
-#define KVM_PTE_TYPE_TABLE 1
-
struct kvm_pgtable_walk_data {
struct kvm_pgtable_walker *walker;
--
2.49.0.1143.g0be31eac6b-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v5 10/10] KVM: arm64: np-guest CMOs with PMD_SIZE fixmap
2025-05-20 8:52 ` [PATCH v5 10/10] KVM: arm64: np-guest CMOs with PMD_SIZE fixmap Vincent Donnefort
@ 2025-05-21 11:01 ` Marc Zyngier
2025-05-21 11:43 ` Vincent Donnefort
0 siblings, 1 reply; 14+ messages in thread
From: Marc Zyngier @ 2025-05-21 11:01 UTC (permalink / raw)
To: Vincent Donnefort
Cc: oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, qperret, linux-arm-kernel, kvmarm,
linux-kernel, kernel-team
On Tue, 20 May 2025 09:52:01 +0100,
Vincent Donnefort <vdonnefort@google.com> wrote:
>
> With the introduction of stage-2 huge mappings in the pKVM hypervisor,
> guest pages CMO is needed for PMD_SIZE size. Fixmap only supports
> PAGE_SIZE and iterating over the huge-page is time consuming (mostly due
> to TLBI on hyp_fixmap_unmap) which is a problem for EL2 latency.
>
> Introduce a shared PMD_SIZE fixmap (hyp_fixblock_map/hyp_fixblock_unmap)
> to improve guest page CMOs when stage-2 huge mappings are installed.
>
> On a Pixel6, the iterative solution resulted in a latency of ~700us,
> while the PMD_SIZE fixmap reduces it to ~100us.
>
> Because of the horrendous private range allocation that would be
> necessary, this is disabled for 64KiB pages systems.
>
> Suggested-by: Quentin Perret <qperret@google.com>
> Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
> Signed-off-by: Quentin Perret <qperret@google.com>
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 1b43bcd2a679..2888b5d03757 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -59,6 +59,11 @@ typedef u64 kvm_pte_t;
>
> #define KVM_PHYS_INVALID (-1ULL)
>
> +#define KVM_PTE_TYPE BIT(1)
> +#define KVM_PTE_TYPE_BLOCK 0
> +#define KVM_PTE_TYPE_PAGE 1
> +#define KVM_PTE_TYPE_TABLE 1
> +
> #define KVM_PTE_LEAF_ATTR_LO GENMASK(11, 2)
>
> #define KVM_PTE_LEAF_ATTR_LO_S1_ATTRIDX GENMASK(4, 2)
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> index 230e4f2527de..6e83ce35c2f2 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> @@ -13,9 +13,11 @@
> extern struct kvm_pgtable pkvm_pgtable;
> extern hyp_spinlock_t pkvm_pgd_lock;
>
> -int hyp_create_pcpu_fixmap(void);
> +int hyp_create_fixmap(void);
> void *hyp_fixmap_map(phys_addr_t phys);
> void hyp_fixmap_unmap(void);
> +void *hyp_fixblock_map(phys_addr_t phys, size_t *size);
> +void hyp_fixblock_unmap(void);
>
> int hyp_create_idmap(u32 hyp_va_bits);
> int hyp_map_vectors(void);
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 1490820b9ebe..962948534179 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -216,34 +216,42 @@ static void guest_s2_put_page(void *addr)
> hyp_put_page(¤t_vm->pool, addr);
> }
>
> -static void clean_dcache_guest_page(void *va, size_t size)
> +static void __apply_guest_page(void *va, size_t size,
> + void (*func)(void *addr, size_t size))
> {
> size += va - PTR_ALIGN_DOWN(va, PAGE_SIZE);
> va = PTR_ALIGN_DOWN(va, PAGE_SIZE);
> size = PAGE_ALIGN(size);
>
> while (size) {
> - __clean_dcache_guest_page(hyp_fixmap_map(__hyp_pa(va)),
> - PAGE_SIZE);
> - hyp_fixmap_unmap();
> - va += PAGE_SIZE;
> - size -= PAGE_SIZE;
> + size_t map_size = PAGE_SIZE;
> + void *map;
> +
> + if (size >= PMD_SIZE)
> + map = hyp_fixblock_map(__hyp_pa(va), &map_size);
You seem to consider that if size if PMD_SIZE (or more), then va must
be PMD aligned. I don't think this is correct.
Such an iterator should start by doing PAGE_SIZEd operations until va
is PMD-aligned. Only at this point can it perform PMD_SIZEd
operations, until the remaining size is less than PMD_SIZE. And at
that point, it's PAGE_SIZE all over again until the end.
Does that make sense to you?
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v5 10/10] KVM: arm64: np-guest CMOs with PMD_SIZE fixmap
2025-05-21 11:01 ` Marc Zyngier
@ 2025-05-21 11:43 ` Vincent Donnefort
2025-05-21 12:04 ` Marc Zyngier
0 siblings, 1 reply; 14+ messages in thread
From: Vincent Donnefort @ 2025-05-21 11:43 UTC (permalink / raw)
To: Marc Zyngier
Cc: oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, qperret, linux-arm-kernel, kvmarm,
linux-kernel, kernel-team
On Wed, May 21, 2025 at 12:01:26PM +0100, Marc Zyngier wrote:
> On Tue, 20 May 2025 09:52:01 +0100,
> Vincent Donnefort <vdonnefort@google.com> wrote:
> >
> > With the introduction of stage-2 huge mappings in the pKVM hypervisor,
> > guest pages CMO is needed for PMD_SIZE size. Fixmap only supports
> > PAGE_SIZE and iterating over the huge-page is time consuming (mostly due
> > to TLBI on hyp_fixmap_unmap) which is a problem for EL2 latency.
> >
> > Introduce a shared PMD_SIZE fixmap (hyp_fixblock_map/hyp_fixblock_unmap)
> > to improve guest page CMOs when stage-2 huge mappings are installed.
> >
> > On a Pixel6, the iterative solution resulted in a latency of ~700us,
> > while the PMD_SIZE fixmap reduces it to ~100us.
> >
> > Because of the horrendous private range allocation that would be
> > necessary, this is disabled for 64KiB pages systems.
> >
> > Suggested-by: Quentin Perret <qperret@google.com>
> > Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
> > Signed-off-by: Quentin Perret <qperret@google.com>
> >
> > diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> > index 1b43bcd2a679..2888b5d03757 100644
> > --- a/arch/arm64/include/asm/kvm_pgtable.h
> > +++ b/arch/arm64/include/asm/kvm_pgtable.h
> > @@ -59,6 +59,11 @@ typedef u64 kvm_pte_t;
> >
> > #define KVM_PHYS_INVALID (-1ULL)
> >
> > +#define KVM_PTE_TYPE BIT(1)
> > +#define KVM_PTE_TYPE_BLOCK 0
> > +#define KVM_PTE_TYPE_PAGE 1
> > +#define KVM_PTE_TYPE_TABLE 1
> > +
> > #define KVM_PTE_LEAF_ATTR_LO GENMASK(11, 2)
> >
> > #define KVM_PTE_LEAF_ATTR_LO_S1_ATTRIDX GENMASK(4, 2)
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> > index 230e4f2527de..6e83ce35c2f2 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> > @@ -13,9 +13,11 @@
> > extern struct kvm_pgtable pkvm_pgtable;
> > extern hyp_spinlock_t pkvm_pgd_lock;
> >
> > -int hyp_create_pcpu_fixmap(void);
> > +int hyp_create_fixmap(void);
> > void *hyp_fixmap_map(phys_addr_t phys);
> > void hyp_fixmap_unmap(void);
> > +void *hyp_fixblock_map(phys_addr_t phys, size_t *size);
> > +void hyp_fixblock_unmap(void);
> >
> > int hyp_create_idmap(u32 hyp_va_bits);
> > int hyp_map_vectors(void);
> > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > index 1490820b9ebe..962948534179 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > @@ -216,34 +216,42 @@ static void guest_s2_put_page(void *addr)
> > hyp_put_page(¤t_vm->pool, addr);
> > }
> >
> > -static void clean_dcache_guest_page(void *va, size_t size)
> > +static void __apply_guest_page(void *va, size_t size,
> > + void (*func)(void *addr, size_t size))
> > {
> > size += va - PTR_ALIGN_DOWN(va, PAGE_SIZE);
> > va = PTR_ALIGN_DOWN(va, PAGE_SIZE);
> > size = PAGE_ALIGN(size);
> >
> > while (size) {
> > - __clean_dcache_guest_page(hyp_fixmap_map(__hyp_pa(va)),
> > - PAGE_SIZE);
> > - hyp_fixmap_unmap();
> > - va += PAGE_SIZE;
> > - size -= PAGE_SIZE;
> > + size_t map_size = PAGE_SIZE;
> > + void *map;
> > +
> > + if (size >= PMD_SIZE)
> > + map = hyp_fixblock_map(__hyp_pa(va), &map_size);
>
> You seem to consider that if size if PMD_SIZE (or more), then va must
> be PMD aligned. I don't think this is correct.
>
> Such an iterator should start by doing PAGE_SIZEd operations until va
> is PMD-aligned. Only at this point can it perform PMD_SIZEd
> operations, until the remaining size is less than PMD_SIZE. And at
> that point, it's PAGE_SIZE all over again until the end.
Arg yes you're right :-\
Shall I respin a v6 with that fix or shall I wait a bit more?
>
> Does that make sense to you?
>
> M.
>
> --
> Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v5 10/10] KVM: arm64: np-guest CMOs with PMD_SIZE fixmap
2025-05-21 11:43 ` Vincent Donnefort
@ 2025-05-21 12:04 ` Marc Zyngier
0 siblings, 0 replies; 14+ messages in thread
From: Marc Zyngier @ 2025-05-21 12:04 UTC (permalink / raw)
To: Vincent Donnefort
Cc: oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, qperret, linux-arm-kernel, kvmarm,
linux-kernel, kernel-team
On Wed, 21 May 2025 12:43:08 +0100,
Vincent Donnefort <vdonnefort@google.com> wrote:
>
> On Wed, May 21, 2025 at 12:01:26PM +0100, Marc Zyngier wrote:
> > On Tue, 20 May 2025 09:52:01 +0100,
> > Vincent Donnefort <vdonnefort@google.com> wrote:
> > >
> > > With the introduction of stage-2 huge mappings in the pKVM hypervisor,
> > > guest pages CMO is needed for PMD_SIZE size. Fixmap only supports
> > > PAGE_SIZE and iterating over the huge-page is time consuming (mostly due
> > > to TLBI on hyp_fixmap_unmap) which is a problem for EL2 latency.
> > >
> > > Introduce a shared PMD_SIZE fixmap (hyp_fixblock_map/hyp_fixblock_unmap)
> > > to improve guest page CMOs when stage-2 huge mappings are installed.
> > >
> > > On a Pixel6, the iterative solution resulted in a latency of ~700us,
> > > while the PMD_SIZE fixmap reduces it to ~100us.
> > >
> > > Because of the horrendous private range allocation that would be
> > > necessary, this is disabled for 64KiB pages systems.
> > >
> > > Suggested-by: Quentin Perret <qperret@google.com>
> > > Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
> > > Signed-off-by: Quentin Perret <qperret@google.com>
> > >
> > > diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> > > index 1b43bcd2a679..2888b5d03757 100644
> > > --- a/arch/arm64/include/asm/kvm_pgtable.h
> > > +++ b/arch/arm64/include/asm/kvm_pgtable.h
> > > @@ -59,6 +59,11 @@ typedef u64 kvm_pte_t;
> > >
> > > #define KVM_PHYS_INVALID (-1ULL)
> > >
> > > +#define KVM_PTE_TYPE BIT(1)
> > > +#define KVM_PTE_TYPE_BLOCK 0
> > > +#define KVM_PTE_TYPE_PAGE 1
> > > +#define KVM_PTE_TYPE_TABLE 1
> > > +
> > > #define KVM_PTE_LEAF_ATTR_LO GENMASK(11, 2)
> > >
> > > #define KVM_PTE_LEAF_ATTR_LO_S1_ATTRIDX GENMASK(4, 2)
> > > diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> > > index 230e4f2527de..6e83ce35c2f2 100644
> > > --- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
> > > +++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> > > @@ -13,9 +13,11 @@
> > > extern struct kvm_pgtable pkvm_pgtable;
> > > extern hyp_spinlock_t pkvm_pgd_lock;
> > >
> > > -int hyp_create_pcpu_fixmap(void);
> > > +int hyp_create_fixmap(void);
> > > void *hyp_fixmap_map(phys_addr_t phys);
> > > void hyp_fixmap_unmap(void);
> > > +void *hyp_fixblock_map(phys_addr_t phys, size_t *size);
> > > +void hyp_fixblock_unmap(void);
> > >
> > > int hyp_create_idmap(u32 hyp_va_bits);
> > > int hyp_map_vectors(void);
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > index 1490820b9ebe..962948534179 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > @@ -216,34 +216,42 @@ static void guest_s2_put_page(void *addr)
> > > hyp_put_page(¤t_vm->pool, addr);
> > > }
> > >
> > > -static void clean_dcache_guest_page(void *va, size_t size)
> > > +static void __apply_guest_page(void *va, size_t size,
> > > + void (*func)(void *addr, size_t size))
> > > {
> > > size += va - PTR_ALIGN_DOWN(va, PAGE_SIZE);
> > > va = PTR_ALIGN_DOWN(va, PAGE_SIZE);
> > > size = PAGE_ALIGN(size);
> > >
> > > while (size) {
> > > - __clean_dcache_guest_page(hyp_fixmap_map(__hyp_pa(va)),
> > > - PAGE_SIZE);
> > > - hyp_fixmap_unmap();
> > > - va += PAGE_SIZE;
> > > - size -= PAGE_SIZE;
> > > + size_t map_size = PAGE_SIZE;
> > > + void *map;
> > > +
> > > + if (size >= PMD_SIZE)
> > > + map = hyp_fixblock_map(__hyp_pa(va), &map_size);
> >
> > You seem to consider that if size if PMD_SIZE (or more), then va must
> > be PMD aligned. I don't think this is correct.
> >
> > Such an iterator should start by doing PAGE_SIZEd operations until va
> > is PMD-aligned. Only at this point can it perform PMD_SIZEd
> > operations, until the remaining size is less than PMD_SIZE. And at
> > that point, it's PAGE_SIZE all over again until the end.
>
> Arg yes you're right :-\
>
> Shall I respin a v6 with that fix or shall I wait a bit more?
Please send a new version ASAP, as I'm really getting very close to
locking down the tree (and I keep finding embarrassing bugs...).
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-05-21 12:04 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-20 8:51 [PATCH v5 00/10] Stage-2 huge mappings for pKVM np-guests Vincent Donnefort
2025-05-20 8:51 ` [PATCH v5 01/10] KVM: arm64: Handle huge mappings for np-guest CMOs Vincent Donnefort
2025-05-20 8:51 ` [PATCH v5 02/10] KVM: arm64: Introduce for_each_hyp_page Vincent Donnefort
2025-05-20 8:51 ` [PATCH v5 03/10] KVM: arm64: Add a range to __pkvm_host_share_guest() Vincent Donnefort
2025-05-20 8:51 ` [PATCH v5 04/10] KVM: arm64: Add a range to __pkvm_host_unshare_guest() Vincent Donnefort
2025-05-20 8:51 ` [PATCH v5 05/10] KVM: arm64: Add a range to __pkvm_host_wrprotect_guest() Vincent Donnefort
2025-05-20 8:51 ` [PATCH v5 06/10] KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest() Vincent Donnefort
2025-05-20 8:51 ` [PATCH v5 07/10] KVM: arm64: Convert pkvm_mappings to interval tree Vincent Donnefort
2025-05-20 8:51 ` [PATCH v5 08/10] KVM: arm64: Add a range to pkvm_mappings Vincent Donnefort
2025-05-20 8:52 ` [PATCH v5 09/10] KVM: arm64: Stage-2 huge mappings for np-guests Vincent Donnefort
2025-05-20 8:52 ` [PATCH v5 10/10] KVM: arm64: np-guest CMOs with PMD_SIZE fixmap Vincent Donnefort
2025-05-21 11:01 ` Marc Zyngier
2025-05-21 11:43 ` Vincent Donnefort
2025-05-21 12:04 ` Marc Zyngier
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).