[PATCH v2 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM
@ 2024-12-03 10:37 Quentin Perret
  2024-12-03 10:37 ` [PATCH v2 01/18] KVM: arm64: Change the layout of enum pkvm_page_state Quentin Perret
                   ` (17 more replies)
  0 siblings, 18 replies; 50+ messages in thread
From: Quentin Perret @ 2024-12-03 10:37 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon
  Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
	kvmarm, linux-kernel

Hi all,

This is the v2 of the series adding support for non-protected guests to
pKVM. Please refer to v1 for all the context:

  https://lore.kernel.org/kvmarm/20241104133204.85208-1-qperret@google.com/

The series is organized as follows:

 - Patches 01 to 04 move the host ownership state tracking from the
   host's stage-2 page-table to the hypervisor's vmemmap. This avoids
   fragmenting the host stage-2 for shared pages, which is only needed
   to store an annotation in the SW bits of the corresponding PTE. All
   pages mapped into non-protected guests are shared from pKVM's PoV,
   so the cost of stage-2 fragmentation will increase massively as we
   start tracking that at EL2. Note that these patches also help with
   the existing sharing for e.g. FF-A, so they could possibly be merged
   separately from the rest of the series.

 - Patches 05 to 07 implement a minor refactoring of the pgtable code to
   ease the integration of the pKVM MMU later on.

 - Patches 08 to 16 introduce all the infrastructure needed on the pKVM
   side for handling guest stage-2 page-tables at EL2.

 - Patches 17 and 18 plumb the newly introduced pKVM support into
   KVM/arm64.

Patches based on 6.13-rc1, tested on Pixel 6 and Qemu.

Changes in v2:
 - Rebased on 6.13-rc1 (small conflicts with 2362506f7cff ("KVM: arm64:
   Don't mark "struct page" accessed when making SPTE young") in
   particular)
 - Fixed kerneldoc breakage for __unmap_stage2_range()
 - Fixed pkvm_pgtable_test_clear_young() to use correct HVC
 - Folded guest_get_valid_pte() into __check_host_unshare_guest() for
   clarity

Thanks,
Quentin

Marc Zyngier (1):
  KVM: arm64: Introduce __pkvm_vcpu_{load,put}()

Quentin Perret (17):
  KVM: arm64: Change the layout of enum pkvm_page_state
  KVM: arm64: Move enum pkvm_page_state to memory.h
  KVM: arm64: Make hyp_page::order a u8
  KVM: arm64: Move host page ownership tracking to the hyp vmemmap
  KVM: arm64: Pass walk flags to kvm_pgtable_stage2_mkyoung
  KVM: arm64: Pass walk flags to kvm_pgtable_stage2_relax_perms
  KVM: arm64: Make kvm_pgtable_stage2_init() a static inline function
  KVM: arm64: Add {get,put}_pkvm_hyp_vm() helpers
  KVM: arm64: Introduce __pkvm_host_share_guest()
  KVM: arm64: Introduce __pkvm_host_unshare_guest()
  KVM: arm64: Introduce __pkvm_host_relax_guest_perms()
  KVM: arm64: Introduce __pkvm_host_wrprotect_guest()
  KVM: arm64: Introduce __pkvm_host_test_clear_young_guest()
  KVM: arm64: Introduce __pkvm_host_mkyoung_guest()
  KVM: arm64: Introduce __pkvm_tlb_flush_vmid()
  KVM: arm64: Introduce the EL1 pKVM MMU
  KVM: arm64: Plumb the pKVM MMU in KVM

 arch/arm64/include/asm/kvm_asm.h              |   9 +
 arch/arm64/include/asm/kvm_host.h             |   4 +
 arch/arm64/include/asm/kvm_pgtable.h          |  42 ++-
 arch/arm64/include/asm/kvm_pkvm.h             |  28 ++
 arch/arm64/kvm/arm.c                          |  23 +-
 arch/arm64/kvm/hyp/include/nvhe/gfp.h         |   6 +-
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  38 +--
 arch/arm64/kvm/hyp/include/nvhe/memory.h      |  43 ++-
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h        |  15 +
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            | 210 +++++++++++-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 312 ++++++++++++++++--
 arch/arm64/kvm/hyp/nvhe/page_alloc.c          |  14 +-
 arch/arm64/kvm/hyp/nvhe/pkvm.c                |  56 ++++
 arch/arm64/kvm/hyp/nvhe/setup.c               |   7 +-
 arch/arm64/kvm/hyp/pgtable.c                  |  13 +-
 arch/arm64/kvm/mmu.c                          | 109 ++++--
 arch/arm64/kvm/pkvm.c                         | 195 +++++++++++
 arch/arm64/kvm/vgic/vgic-v3.c                 |   6 +-
 18 files changed, 987 insertions(+), 143 deletions(-)

-- 
2.47.0.338.g60cca15819-goog



^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 01/18] KVM: arm64: Change the layout of enum pkvm_page_state
  2024-12-03 10:37 [PATCH v2 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
@ 2024-12-03 10:37 ` Quentin Perret
  2024-12-10 12:59   ` Fuad Tabba
  2024-12-03 10:37 ` [PATCH v2 02/18] KVM: arm64: Move enum pkvm_page_state to memory.h Quentin Perret
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 50+ messages in thread
From: Quentin Perret @ 2024-12-03 10:37 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon
  Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
	kvmarm, linux-kernel

The 'concrete' (a.k.a non-meta) page states are currently encoded using
software bits in PTEs. For performance reasons, the abstract
pkvm_page_state enum uses the same bits to encode these states as that
makes conversions from and to PTEs easy.

In order to prepare the ground for moving the 'concrete' state storage
to the hyp vmemmap, re-arrange the enum to use bits 0 and 1 for this
purpose.

No functional changes intended.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 0972faccc2af..ca3177481b78 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -24,25 +24,28 @@
  */
 enum pkvm_page_state {
 	PKVM_PAGE_OWNED			= 0ULL,
-	PKVM_PAGE_SHARED_OWNED		= KVM_PGTABLE_PROT_SW0,
-	PKVM_PAGE_SHARED_BORROWED	= KVM_PGTABLE_PROT_SW1,
-	__PKVM_PAGE_RESERVED		= KVM_PGTABLE_PROT_SW0 |
-					  KVM_PGTABLE_PROT_SW1,
+	PKVM_PAGE_SHARED_OWNED		= BIT(0),
+	PKVM_PAGE_SHARED_BORROWED	= BIT(1),
+	__PKVM_PAGE_RESERVED		= BIT(0) | BIT(1),
 
 	/* Meta-states which aren't encoded directly in the PTE's SW bits */
-	PKVM_NOPAGE,
+	PKVM_NOPAGE			= BIT(2),
 };
+#define PKVM_PAGE_META_STATES_MASK	(~(BIT(0) | BIT(1)))
 
 #define PKVM_PAGE_STATE_PROT_MASK	(KVM_PGTABLE_PROT_SW0 | KVM_PGTABLE_PROT_SW1)
 static inline enum kvm_pgtable_prot pkvm_mkstate(enum kvm_pgtable_prot prot,
 						 enum pkvm_page_state state)
 {
-	return (prot & ~PKVM_PAGE_STATE_PROT_MASK) | state;
+	BUG_ON(state & PKVM_PAGE_META_STATES_MASK);
+	prot &= ~PKVM_PAGE_STATE_PROT_MASK;
+	prot |= FIELD_PREP(PKVM_PAGE_STATE_PROT_MASK, state);
+	return prot;
 }
 
 static inline enum pkvm_page_state pkvm_getstate(enum kvm_pgtable_prot prot)
 {
-	return prot & PKVM_PAGE_STATE_PROT_MASK;
+	return FIELD_GET(PKVM_PAGE_STATE_PROT_MASK, prot);
 }
 
 struct host_mmu {
-- 
2.47.0.338.g60cca15819-goog



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 02/18] KVM: arm64: Move enum pkvm_page_state to memory.h
  2024-12-03 10:37 [PATCH v2 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
  2024-12-03 10:37 ` [PATCH v2 01/18] KVM: arm64: Change the layout of enum pkvm_page_state Quentin Perret
@ 2024-12-03 10:37 ` Quentin Perret
  2024-12-03 10:37 ` [PATCH v2 03/18] KVM: arm64: Make hyp_page::order a u8 Quentin Perret
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 50+ messages in thread
From: Quentin Perret @ 2024-12-03 10:37 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon
  Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
	kvmarm, linux-kernel

In order to prepare the way for storing page-tracking information in
pKVM's vmemmap, move the enum pkvm_page_state definition to
nvhe/memory.h.

No functional changes intended.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 35 +------------------
 arch/arm64/kvm/hyp/include/nvhe/memory.h      | 34 ++++++++++++++++++
 2 files changed, 35 insertions(+), 34 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index ca3177481b78..25038ac705d8 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -11,43 +11,10 @@
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_pgtable.h>
 #include <asm/virt.h>
+#include <nvhe/memory.h>
 #include <nvhe/pkvm.h>
 #include <nvhe/spinlock.h>
 
-/*
- * SW bits 0-1 are reserved to track the memory ownership state of each page:
- *   00: The page is owned exclusively by the page-table owner.
- *   01: The page is owned by the page-table owner, but is shared
- *       with another entity.
- *   10: The page is shared with, but not owned by the page-table owner.
- *   11: Reserved for future use (lending).
- */
-enum pkvm_page_state {
-	PKVM_PAGE_OWNED			= 0ULL,
-	PKVM_PAGE_SHARED_OWNED		= BIT(0),
-	PKVM_PAGE_SHARED_BORROWED	= BIT(1),
-	__PKVM_PAGE_RESERVED		= BIT(0) | BIT(1),
-
-	/* Meta-states which aren't encoded directly in the PTE's SW bits */
-	PKVM_NOPAGE			= BIT(2),
-};
-#define PKVM_PAGE_META_STATES_MASK	(~(BIT(0) | BIT(1)))
-
-#define PKVM_PAGE_STATE_PROT_MASK	(KVM_PGTABLE_PROT_SW0 | KVM_PGTABLE_PROT_SW1)
-static inline enum kvm_pgtable_prot pkvm_mkstate(enum kvm_pgtable_prot prot,
-						 enum pkvm_page_state state)
-{
-	BUG_ON(state & PKVM_PAGE_META_STATES_MASK);
-	prot &= ~PKVM_PAGE_STATE_PROT_MASK;
-	prot |= FIELD_PREP(PKVM_PAGE_STATE_PROT_MASK, state);
-	return prot;
-}
-
-static inline enum pkvm_page_state pkvm_getstate(enum kvm_pgtable_prot prot)
-{
-	return FIELD_GET(PKVM_PAGE_STATE_PROT_MASK, prot);
-}
-
 struct host_mmu {
 	struct kvm_arch arch;
 	struct kvm_pgtable pgt;
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index ab205c4d6774..6dfeb000371c 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -7,6 +7,40 @@
 
 #include <linux/types.h>
 
+/*
+ * SW bits 0-1 are reserved to track the memory ownership state of each page:
+ *   00: The page is owned exclusively by the page-table owner.
+ *   01: The page is owned by the page-table owner, but is shared
+ *       with another entity.
+ *   10: The page is shared with, but not owned by the page-table owner.
+ *   11: Reserved for future use (lending).
+ */
+enum pkvm_page_state {
+	PKVM_PAGE_OWNED			= 0ULL,
+	PKVM_PAGE_SHARED_OWNED		= BIT(0),
+	PKVM_PAGE_SHARED_BORROWED	= BIT(1),
+	__PKVM_PAGE_RESERVED		= BIT(0) | BIT(1),
+
+	/* Meta-states which aren't encoded directly in the PTE's SW bits */
+	PKVM_NOPAGE			= BIT(2),
+};
+#define PKVM_PAGE_META_STATES_MASK	(~(BIT(0) | BIT(1)))
+
+#define PKVM_PAGE_STATE_PROT_MASK	(KVM_PGTABLE_PROT_SW0 | KVM_PGTABLE_PROT_SW1)
+static inline enum kvm_pgtable_prot pkvm_mkstate(enum kvm_pgtable_prot prot,
+						 enum pkvm_page_state state)
+{
+	BUG_ON(state & PKVM_PAGE_META_STATES_MASK);
+	prot &= ~PKVM_PAGE_STATE_PROT_MASK;
+	prot |= FIELD_PREP(PKVM_PAGE_STATE_PROT_MASK, state);
+	return prot;
+}
+
+static inline enum pkvm_page_state pkvm_getstate(enum kvm_pgtable_prot prot)
+{
+	return FIELD_GET(PKVM_PAGE_STATE_PROT_MASK, prot);
+}
+
 struct hyp_page {
 	unsigned short refcount;
 	unsigned short order;
-- 
2.47.0.338.g60cca15819-goog



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 03/18] KVM: arm64: Make hyp_page::order a u8
  2024-12-03 10:37 [PATCH v2 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
  2024-12-03 10:37 ` [PATCH v2 01/18] KVM: arm64: Change the layout of enum pkvm_page_state Quentin Perret
  2024-12-03 10:37 ` [PATCH v2 02/18] KVM: arm64: Move enum pkvm_page_state to memory.h Quentin Perret
@ 2024-12-03 10:37 ` Quentin Perret
  2024-12-03 10:37 ` [PATCH v2 04/18] KVM: arm64: Move host page ownership tracking to the hyp vmemmap Quentin Perret
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 50+ messages in thread
From: Quentin Perret @ 2024-12-03 10:37 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon
  Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
	kvmarm, linux-kernel

We don't need 16 bits to store the hyp page order, and we'll need some
bits to store page ownership data soon, so let's reduce the order
member.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/include/nvhe/gfp.h    |  6 +++---
 arch/arm64/kvm/hyp/include/nvhe/memory.h |  5 +++--
 arch/arm64/kvm/hyp/nvhe/page_alloc.c     | 14 +++++++-------
 3 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/gfp.h b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
index 97c527ef53c2..f1725bad6331 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/gfp.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
@@ -7,7 +7,7 @@
 #include <nvhe/memory.h>
 #include <nvhe/spinlock.h>
 
-#define HYP_NO_ORDER	USHRT_MAX
+#define HYP_NO_ORDER	0xff
 
 struct hyp_pool {
 	/*
@@ -19,11 +19,11 @@ struct hyp_pool {
 	struct list_head free_area[NR_PAGE_ORDERS];
 	phys_addr_t range_start;
 	phys_addr_t range_end;
-	unsigned short max_order;
+	u8 max_order;
 };
 
 /* Allocation */
-void *hyp_alloc_pages(struct hyp_pool *pool, unsigned short order);
+void *hyp_alloc_pages(struct hyp_pool *pool, u8 order);
 void hyp_split_page(struct hyp_page *page);
 void hyp_get_page(struct hyp_pool *pool, void *addr);
 void hyp_put_page(struct hyp_pool *pool, void *addr);
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index 6dfeb000371c..88cb8ff9e769 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -42,8 +42,9 @@ static inline enum pkvm_page_state pkvm_getstate(enum kvm_pgtable_prot prot)
 }
 
 struct hyp_page {
-	unsigned short refcount;
-	unsigned short order;
+	u16 refcount;
+	u8 order;
+	u8 reserved;
 };
 
 extern u64 __hyp_vmemmap;
diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index e691290d3765..a1eb27a1a747 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -32,7 +32,7 @@ u64 __hyp_vmemmap;
  */
 static struct hyp_page *__find_buddy_nocheck(struct hyp_pool *pool,
 					     struct hyp_page *p,
-					     unsigned short order)
+					     u8 order)
 {
 	phys_addr_t addr = hyp_page_to_phys(p);
 
@@ -51,7 +51,7 @@ static struct hyp_page *__find_buddy_nocheck(struct hyp_pool *pool,
 /* Find a buddy page currently available for allocation */
 static struct hyp_page *__find_buddy_avail(struct hyp_pool *pool,
 					   struct hyp_page *p,
-					   unsigned short order)
+					   u8 order)
 {
 	struct hyp_page *buddy = __find_buddy_nocheck(pool, p, order);
 
@@ -94,7 +94,7 @@ static void __hyp_attach_page(struct hyp_pool *pool,
 			      struct hyp_page *p)
 {
 	phys_addr_t phys = hyp_page_to_phys(p);
-	unsigned short order = p->order;
+	u8 order = p->order;
 	struct hyp_page *buddy;
 
 	memset(hyp_page_to_virt(p), 0, PAGE_SIZE << p->order);
@@ -129,7 +129,7 @@ static void __hyp_attach_page(struct hyp_pool *pool,
 
 static struct hyp_page *__hyp_extract_page(struct hyp_pool *pool,
 					   struct hyp_page *p,
-					   unsigned short order)
+					   u8 order)
 {
 	struct hyp_page *buddy;
 
@@ -183,7 +183,7 @@ void hyp_get_page(struct hyp_pool *pool, void *addr)
 
 void hyp_split_page(struct hyp_page *p)
 {
-	unsigned short order = p->order;
+	u8 order = p->order;
 	unsigned int i;
 
 	p->order = 0;
@@ -195,10 +195,10 @@ void hyp_split_page(struct hyp_page *p)
 	}
 }
 
-void *hyp_alloc_pages(struct hyp_pool *pool, unsigned short order)
+void *hyp_alloc_pages(struct hyp_pool *pool, u8 order)
 {
-	unsigned short i = order;
 	struct hyp_page *p;
+	u8 i = order;
 
 	hyp_spin_lock(&pool->lock);
 
-- 
2.47.0.338.g60cca15819-goog



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 04/18] KVM: arm64: Move host page ownership tracking to the hyp vmemmap
  2024-12-03 10:37 [PATCH v2 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
                   ` (2 preceding siblings ...)
  2024-12-03 10:37 ` [PATCH v2 03/18] KVM: arm64: Make hyp_page::order a u8 Quentin Perret
@ 2024-12-03 10:37 ` Quentin Perret
  2024-12-10 13:02   ` Fuad Tabba
  2024-12-03 10:37 ` [PATCH v2 05/18] KVM: arm64: Pass walk flags to kvm_pgtable_stage2_mkyoung Quentin Perret
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 50+ messages in thread
From: Quentin Perret @ 2024-12-03 10:37 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon
  Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
	kvmarm, linux-kernel

We currently store part of the page-tracking state in PTE software bits
for the host, guests and the hypervisor. This is sub-optimal when e.g.
sharing pages as this forces to break block mappings purely to support
this software tracking. This causes an unnecessarily fragmented stage-2
page-table for the host in particular when it shares pages with Secure,
which can lead to measurable regressions. Moreover, having this state
stored in the page-table forces us to do multiple costly walks on the
page transition path, hence causing overhead.

In order to work around these problems, move the host-side page-tracking
logic from SW bits in its stage-2 PTEs to the hypervisor's vmemmap.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/include/nvhe/memory.h |  6 +-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c    | 94 ++++++++++++++++--------
 arch/arm64/kvm/hyp/nvhe/setup.c          |  7 +-
 3 files changed, 71 insertions(+), 36 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index 88cb8ff9e769..08f3a0416d4c 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -8,7 +8,7 @@
 #include <linux/types.h>
 
 /*
- * SW bits 0-1 are reserved to track the memory ownership state of each page:
+ * Bits 0-1 are reserved to track the memory ownership state of each page:
  *   00: The page is owned exclusively by the page-table owner.
  *   01: The page is owned by the page-table owner, but is shared
  *       with another entity.
@@ -44,7 +44,9 @@ static inline enum pkvm_page_state pkvm_getstate(enum kvm_pgtable_prot prot)
 struct hyp_page {
 	u16 refcount;
 	u8 order;
-	u8 reserved;
+
+	/* Host (non-meta) state. Guarded by the host stage-2 lock. */
+	enum pkvm_page_state host_state : 8;
 };
 
 extern u64 __hyp_vmemmap;
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index caba3e4bd09e..1595081c4f6b 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -201,8 +201,8 @@ static void *guest_s2_zalloc_page(void *mc)
 
 	memset(addr, 0, PAGE_SIZE);
 	p = hyp_virt_to_page(addr);
-	memset(p, 0, sizeof(*p));
 	p->refcount = 1;
+	p->order = 0;
 
 	return addr;
 }
@@ -268,6 +268,7 @@ int kvm_guest_prepare_stage2(struct pkvm_hyp_vm *vm, void *pgd)
 
 void reclaim_guest_pages(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc)
 {
+	struct hyp_page *page;
 	void *addr;
 
 	/* Dump all pgtable pages in the hyp_pool */
@@ -279,7 +280,9 @@ void reclaim_guest_pages(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc)
 	/* Drain the hyp_pool into the memcache */
 	addr = hyp_alloc_pages(&vm->pool, 0);
 	while (addr) {
-		memset(hyp_virt_to_page(addr), 0, sizeof(struct hyp_page));
+		page = hyp_virt_to_page(addr);
+		page->refcount = 0;
+		page->order = 0;
 		push_hyp_memcache(mc, addr, hyp_virt_to_phys);
 		WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(addr), 1));
 		addr = hyp_alloc_pages(&vm->pool, 0);
@@ -382,19 +385,25 @@ bool addr_is_memory(phys_addr_t phys)
 	return !!find_mem_range(phys, &range);
 }
 
-static bool addr_is_allowed_memory(phys_addr_t phys)
+static bool is_in_mem_range(u64 addr, struct kvm_mem_range *range)
+{
+	return range->start <= addr && addr < range->end;
+}
+
+static int range_is_allowed_memory(u64 start, u64 end)
 {
 	struct memblock_region *reg;
 	struct kvm_mem_range range;
 
-	reg = find_mem_range(phys, &range);
+	/* Can't check the state of both MMIO and memory regions at once */
+	reg = find_mem_range(start, &range);
+	if (!is_in_mem_range(end - 1, &range))
+		return -EINVAL;
 
-	return reg && !(reg->flags & MEMBLOCK_NOMAP);
-}
+	if (!reg || reg->flags & MEMBLOCK_NOMAP)
+		return -EPERM;
 
-static bool is_in_mem_range(u64 addr, struct kvm_mem_range *range)
-{
-	return range->start <= addr && addr < range->end;
+	return 0;
 }
 
 static bool range_is_memory(u64 start, u64 end)
@@ -454,8 +463,11 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
 	if (kvm_pte_valid(pte))
 		return -EAGAIN;
 
-	if (pte)
+	if (pte) {
+		WARN_ON(addr_is_memory(addr) &&
+			!(hyp_phys_to_page(addr)->host_state & PKVM_NOPAGE));
 		return -EPERM;
+	}
 
 	do {
 		u64 granule = kvm_granule_size(level);
@@ -477,10 +489,29 @@ int host_stage2_idmap_locked(phys_addr_t addr, u64 size,
 	return host_stage2_try(__host_stage2_idmap, addr, addr + size, prot);
 }
 
+static void __host_update_page_state(phys_addr_t addr, u64 size, enum pkvm_page_state state)
+{
+	phys_addr_t end = addr + size;
+	for (; addr < end; addr += PAGE_SIZE)
+		hyp_phys_to_page(addr)->host_state = state;
+}
+
 int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
 {
-	return host_stage2_try(kvm_pgtable_stage2_set_owner, &host_mmu.pgt,
-			       addr, size, &host_s2_pool, owner_id);
+	int ret;
+
+	ret = host_stage2_try(kvm_pgtable_stage2_set_owner, &host_mmu.pgt,
+			      addr, size, &host_s2_pool, owner_id);
+	if (ret || !addr_is_memory(addr))
+		return ret;
+
+	/* Don't forget to update the vmemmap tracking for the host */
+	if (owner_id == PKVM_ID_HOST)
+		__host_update_page_state(addr, size, PKVM_PAGE_OWNED);
+	else
+		__host_update_page_state(addr, size, PKVM_NOPAGE);
+
+	return 0;
 }
 
 static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot prot)
@@ -604,35 +635,38 @@ static int check_page_state_range(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	return kvm_pgtable_walk(pgt, addr, size, &walker);
 }
 
-static enum pkvm_page_state host_get_page_state(kvm_pte_t pte, u64 addr)
-{
-	if (!addr_is_allowed_memory(addr))
-		return PKVM_NOPAGE;
-
-	if (!kvm_pte_valid(pte) && pte)
-		return PKVM_NOPAGE;
-
-	return pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte));
-}
-
 static int __host_check_page_state_range(u64 addr, u64 size,
 					 enum pkvm_page_state state)
 {
-	struct check_walk_data d = {
-		.desired	= state,
-		.get_page_state	= host_get_page_state,
-	};
+	u64 end = addr + size;
+	int ret;
+
+	ret = range_is_allowed_memory(addr, end);
+	if (ret)
+		return ret;
 
 	hyp_assert_lock_held(&host_mmu.lock);
-	return check_page_state_range(&host_mmu.pgt, addr, size, &d);
+	for (; addr < end; addr += PAGE_SIZE) {
+		if (hyp_phys_to_page(addr)->host_state != state)
+			return -EPERM;
+	}
+
+	return 0;
 }
 
 static int __host_set_page_state_range(u64 addr, u64 size,
 				       enum pkvm_page_state state)
 {
-	enum kvm_pgtable_prot prot = pkvm_mkstate(PKVM_HOST_MEM_PROT, state);
+	if (hyp_phys_to_page(addr)->host_state & PKVM_NOPAGE) {
+		int ret = host_stage2_idmap_locked(addr, size, PKVM_HOST_MEM_PROT);
 
-	return host_stage2_idmap_locked(addr, size, prot);
+		if (ret)
+			return ret;
+	}
+
+	__host_update_page_state(addr, size, state);
+
+	return 0;
 }
 
 static int host_request_owned_transition(u64 *completer_addr,
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index cbdd18cd3f98..7e04d1c2a03d 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -180,7 +180,6 @@ static void hpool_put_page(void *addr)
 static int fix_host_ownership_walker(const struct kvm_pgtable_visit_ctx *ctx,
 				     enum kvm_pgtable_walk_flags visit)
 {
-	enum kvm_pgtable_prot prot;
 	enum pkvm_page_state state;
 	phys_addr_t phys;
 
@@ -203,16 +202,16 @@ static int fix_host_ownership_walker(const struct kvm_pgtable_visit_ctx *ctx,
 	case PKVM_PAGE_OWNED:
 		return host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_HYP);
 	case PKVM_PAGE_SHARED_OWNED:
-		prot = pkvm_mkstate(PKVM_HOST_MEM_PROT, PKVM_PAGE_SHARED_BORROWED);
+		hyp_phys_to_page(phys)->host_state = PKVM_PAGE_SHARED_BORROWED;
 		break;
 	case PKVM_PAGE_SHARED_BORROWED:
-		prot = pkvm_mkstate(PKVM_HOST_MEM_PROT, PKVM_PAGE_SHARED_OWNED);
+		hyp_phys_to_page(phys)->host_state = PKVM_PAGE_SHARED_OWNED;
 		break;
 	default:
 		return -EINVAL;
 	}
 
-	return host_stage2_idmap_locked(phys, PAGE_SIZE, prot);
+	return 0;
 }
 
 static int fix_hyp_pgtable_refcnt_walker(const struct kvm_pgtable_visit_ctx *ctx,
-- 
2.47.0.338.g60cca15819-goog



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 05/18] KVM: arm64: Pass walk flags to kvm_pgtable_stage2_mkyoung
  2024-12-03 10:37 [PATCH v2 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
                   ` (3 preceding siblings ...)
  2024-12-03 10:37 ` [PATCH v2 04/18] KVM: arm64: Move host page ownership tracking to the hyp vmemmap Quentin Perret
@ 2024-12-03 10:37 ` Quentin Perret
  2024-12-03 10:37 ` [PATCH v2 06/18] KVM: arm64: Pass walk flags to kvm_pgtable_stage2_relax_perms Quentin Perret
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 50+ messages in thread
From: Quentin Perret @ 2024-12-03 10:37 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon
  Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
	kvmarm, linux-kernel

kvm_pgtable_stage2_mkyoung currently assumes that it is being called
from a 'shared' walker, which will not be true once called from pKVM.
To allow for the re-use of that function, make the walk flags one of
its parameters.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_pgtable.h | 4 +++-
 arch/arm64/kvm/hyp/pgtable.c         | 7 +++----
 arch/arm64/kvm/mmu.c                 | 3 ++-
 3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index aab04097b505..38b7ec1c8614 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -669,13 +669,15 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size);
  * kvm_pgtable_stage2_mkyoung() - Set the access flag in a page-table entry.
  * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init*().
  * @addr:	Intermediate physical address to identify the page-table entry.
+ * @flags:	Flags to control the page-table walk (ex. a shared walk)
  *
  * The offset of @addr within a page is ignored.
  *
  * If there is a valid, leaf page-table entry used to translate @addr, then
  * set the access flag in that entry.
  */
-void kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr);
+void kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr,
+				enum kvm_pgtable_walk_flags flags);
 
 /**
  * kvm_pgtable_stage2_test_clear_young() - Test and optionally clear the access
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 40bd55966540..0470aedb4bf4 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1245,14 +1245,13 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
 					NULL, NULL, 0);
 }
 
-void kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
+void kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr,
+				enum kvm_pgtable_walk_flags flags)
 {
 	int ret;
 
 	ret = stage2_update_leaf_attrs(pgt, addr, 1, KVM_PTE_LEAF_ATTR_LO_S2_AF, 0,
-				       NULL, NULL,
-				       KVM_PGTABLE_WALK_HANDLE_FAULT |
-				       KVM_PGTABLE_WALK_SHARED);
+				       NULL, NULL, flags);
 	if (!ret)
 		dsb(ishst);
 }
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index c9d46ad57e52..a2339b76c826 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1718,13 +1718,14 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 /* Resolve the access fault by making the page young again. */
 static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
 {
+	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED;
 	struct kvm_s2_mmu *mmu;
 
 	trace_kvm_access_fault(fault_ipa);
 
 	read_lock(&vcpu->kvm->mmu_lock);
 	mmu = vcpu->arch.hw_mmu;
-	kvm_pgtable_stage2_mkyoung(mmu->pgt, fault_ipa);
+	kvm_pgtable_stage2_mkyoung(mmu->pgt, fault_ipa, flags);
 	read_unlock(&vcpu->kvm->mmu_lock);
 }
 
-- 
2.47.0.338.g60cca15819-goog



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 06/18] KVM: arm64: Pass walk flags to kvm_pgtable_stage2_relax_perms
  2024-12-03 10:37 [PATCH v2 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
                   ` (4 preceding siblings ...)
  2024-12-03 10:37 ` [PATCH v2 05/18] KVM: arm64: Pass walk flags to kvm_pgtable_stage2_mkyoung Quentin Perret
@ 2024-12-03 10:37 ` Quentin Perret
  2024-12-03 10:37 ` [PATCH v2 07/18] KVM: arm64: Make kvm_pgtable_stage2_init() a static inline function Quentin Perret
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 50+ messages in thread
From: Quentin Perret @ 2024-12-03 10:37 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon
  Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
	kvmarm, linux-kernel

kvm_pgtable_stage2_relax_perms currently assumes that it is being called
from a 'shared' walker, which will not be true once called from pKVM. To
allow for the re-use of that function, make the walk flags one of its
parameters.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_pgtable.h | 4 +++-
 arch/arm64/kvm/hyp/pgtable.c         | 6 ++----
 arch/arm64/kvm/mmu.c                 | 7 +++----
 3 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 38b7ec1c8614..c2f4149283ef 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -707,6 +707,7 @@ bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
  * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init*().
  * @addr:	Intermediate physical address to identify the page-table entry.
  * @prot:	Additional permissions to grant for the mapping.
+ * @flags:	Flags to control the page-table walk (ex. a shared walk)
  *
  * The offset of @addr within a page is ignored.
  *
@@ -719,7 +720,8 @@ bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
  * Return: 0 on success, negative error code on failure.
  */
 int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
-				   enum kvm_pgtable_prot prot);
+				   enum kvm_pgtable_prot prot,
+				   enum kvm_pgtable_walk_flags flags);
 
 /**
  * kvm_pgtable_stage2_flush_range() - Clean and invalidate data cache to Point
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 0470aedb4bf4..b7a3b5363235 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1307,7 +1307,7 @@ bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
 }
 
 int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
-				   enum kvm_pgtable_prot prot)
+				   enum kvm_pgtable_prot prot, enum kvm_pgtable_walk_flags flags)
 {
 	int ret;
 	s8 level;
@@ -1325,9 +1325,7 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
 	if (prot & KVM_PGTABLE_PROT_X)
 		clr |= KVM_PTE_LEAF_ATTR_HI_S2_XN;
 
-	ret = stage2_update_leaf_attrs(pgt, addr, 1, set, clr, NULL, &level,
-				       KVM_PGTABLE_WALK_HANDLE_FAULT |
-				       KVM_PGTABLE_WALK_SHARED);
+	ret = stage2_update_leaf_attrs(pgt, addr, 1, set, clr, NULL, &level, flags);
 	if (!ret || ret == -EAGAIN)
 		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa_nsh, pgt->mmu, addr, level);
 	return ret;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index a2339b76c826..641e4fec1659 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1452,6 +1452,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
 	struct kvm_pgtable *pgt;
 	struct page *page;
+	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED;
 
 	if (fault_is_perm)
 		fault_granule = kvm_vcpu_trap_get_perm_fault_granule(vcpu);
@@ -1695,13 +1696,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		 * PTE, which will be preserved.
 		 */
 		prot &= ~KVM_NV_GUEST_MAP_SZ;
-		ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
+		ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot, flags);
 	} else {
 		ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
 					     __pfn_to_phys(pfn), prot,
-					     memcache,
-					     KVM_PGTABLE_WALK_HANDLE_FAULT |
-					     KVM_PGTABLE_WALK_SHARED);
+					     memcache, flags);
 	}
 
 out_unlock:
-- 
2.47.0.338.g60cca15819-goog



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 07/18] KVM: arm64: Make kvm_pgtable_stage2_init() a static inline function
  2024-12-03 10:37 [PATCH v2 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
                   ` (5 preceding siblings ...)
  2024-12-03 10:37 ` [PATCH v2 06/18] KVM: arm64: Pass walk flags to kvm_pgtable_stage2_relax_perms Quentin Perret
@ 2024-12-03 10:37 ` Quentin Perret
  2024-12-03 10:37 ` [PATCH v2 08/18] KVM: arm64: Add {get,put}_pkvm_hyp_vm() helpers Quentin Perret
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 50+ messages in thread
From: Quentin Perret @ 2024-12-03 10:37 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon
  Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
	kvmarm, linux-kernel

Turn kvm_pgtable_stage2_init() into a static inline function instead of
a macro. This will allow the usage of typeof() on it later on.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_pgtable.h | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index c2f4149283ef..04418b5e3004 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -526,8 +526,11 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 			      enum kvm_pgtable_stage2_flags flags,
 			      kvm_pgtable_force_pte_cb_t force_pte_cb);
 
-#define kvm_pgtable_stage2_init(pgt, mmu, mm_ops) \
-	__kvm_pgtable_stage2_init(pgt, mmu, mm_ops, 0, NULL)
+static inline int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
+					  struct kvm_pgtable_mm_ops *mm_ops)
+{
+	return __kvm_pgtable_stage2_init(pgt, mmu, mm_ops, 0, NULL);
+}
 
 /**
  * kvm_pgtable_stage2_destroy() - Destroy an unused guest stage-2 page-table.
-- 
2.47.0.338.g60cca15819-goog



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 08/18] KVM: arm64: Add {get,put}_pkvm_hyp_vm() helpers
  2024-12-03 10:37 [PATCH v2 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
                   ` (6 preceding siblings ...)
  2024-12-03 10:37 ` [PATCH v2 07/18] KVM: arm64: Make kvm_pgtable_stage2_init() a static inline function Quentin Perret
@ 2024-12-03 10:37 ` Quentin Perret
  2024-12-03 10:37 ` [PATCH v2 09/18] KVM: arm64: Introduce __pkvm_vcpu_{load,put}() Quentin Perret
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 50+ messages in thread
From: Quentin Perret @ 2024-12-03 10:37 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon
  Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
	kvmarm, linux-kernel

In preparation for accessing pkvm_hyp_vm structures at EL2 in a context
where we can't always expect a vCPU to be loaded (e.g. MMU notifiers),
introduce get/put helpers to get temporary references to hyp VMs from
any context.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h |  3 +++
 arch/arm64/kvm/hyp/nvhe/pkvm.c         | 20 ++++++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index 24a9a8330d19..f361d8b91930 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -70,4 +70,7 @@ struct pkvm_hyp_vcpu *pkvm_load_hyp_vcpu(pkvm_handle_t handle,
 					 unsigned int vcpu_idx);
 void pkvm_put_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu);
 
+struct pkvm_hyp_vm *get_pkvm_hyp_vm(pkvm_handle_t handle);
+void put_pkvm_hyp_vm(struct pkvm_hyp_vm *hyp_vm);
+
 #endif /* __ARM64_KVM_NVHE_PKVM_H__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 01616c39a810..4db88bedf8d5 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -327,6 +327,26 @@ void pkvm_put_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
 	hyp_spin_unlock(&vm_table_lock);
 }
 
+struct pkvm_hyp_vm *get_pkvm_hyp_vm(pkvm_handle_t handle)
+{
+	struct pkvm_hyp_vm *hyp_vm;
+
+	hyp_spin_lock(&vm_table_lock);
+	hyp_vm = get_vm_by_handle(handle);
+	if (hyp_vm)
+		hyp_page_ref_inc(hyp_virt_to_page(hyp_vm));
+	hyp_spin_unlock(&vm_table_lock);
+
+	return hyp_vm;
+}
+
+void put_pkvm_hyp_vm(struct pkvm_hyp_vm *hyp_vm)
+{
+	hyp_spin_lock(&vm_table_lock);
+	hyp_page_ref_dec(hyp_virt_to_page(hyp_vm));
+	hyp_spin_unlock(&vm_table_lock);
+}
+
 static void pkvm_init_features_from_host(struct pkvm_hyp_vm *hyp_vm, const struct kvm *host_kvm)
 {
 	struct kvm *kvm = &hyp_vm->kvm;
-- 
2.47.0.338.g60cca15819-goog



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 09/18] KVM: arm64: Introduce __pkvm_vcpu_{load,put}()
  2024-12-03 10:37 [PATCH v2 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
                   ` (7 preceding siblings ...)
  2024-12-03 10:37 ` [PATCH v2 08/18] KVM: arm64: Add {get,put}_pkvm_hyp_vm() helpers Quentin Perret
@ 2024-12-03 10:37 ` Quentin Perret
  2024-12-03 10:37 ` [PATCH v2 10/18] KVM: arm64: Introduce __pkvm_host_share_guest() Quentin Perret
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 50+ messages in thread
From: Quentin Perret @ 2024-12-03 10:37 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon
  Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
	kvmarm, linux-kernel

From: Marc Zyngier <maz@kernel.org>

Rather than look-up the hyp vCPU on every run hypercall at EL2,
introduce a per-CPU 'loaded_hyp_vcpu' tracking variable which is updated
by a pair of load/put hypercalls called directly from
kvm_arch_vcpu_{load,put}() when pKVM is enabled.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_asm.h       |  2 ++
 arch/arm64/kvm/arm.c                   | 14 ++++++++
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h |  7 ++++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c     | 47 ++++++++++++++++++++------
 arch/arm64/kvm/hyp/nvhe/pkvm.c         | 29 ++++++++++++++++
 arch/arm64/kvm/vgic/vgic-v3.c          |  6 ++--
 6 files changed, 93 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index ca2590344313..89c0fac69551 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -79,6 +79,8 @@ enum __kvm_host_smccc_func {
 	__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
 	__KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
 	__KVM_HOST_SMCCC_FUNC___pkvm_teardown_vm,
+	__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_load,
+	__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_put,
 };
 
 #define DECLARE_KVM_VHE_SYM(sym)	extern char sym[]
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a102c3aebdbc..55cc62b2f469 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -619,12 +619,26 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
 	kvm_arch_vcpu_load_debug_state_flags(vcpu);
 
+	if (is_protected_kvm_enabled()) {
+		kvm_call_hyp_nvhe(__pkvm_vcpu_load,
+				  vcpu->kvm->arch.pkvm.handle,
+				  vcpu->vcpu_idx, vcpu->arch.hcr_el2);
+		kvm_call_hyp(__vgic_v3_restore_vmcr_aprs,
+			     &vcpu->arch.vgic_cpu.vgic_v3);
+	}
+
 	if (!cpumask_test_cpu(cpu, vcpu->kvm->arch.supported_cpus))
 		vcpu_set_on_unsupported_cpu(vcpu);
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+	if (is_protected_kvm_enabled()) {
+		kvm_call_hyp(__vgic_v3_save_vmcr_aprs,
+			     &vcpu->arch.vgic_cpu.vgic_v3);
+		kvm_call_hyp_nvhe(__pkvm_vcpu_put);
+	}
+
 	kvm_arch_vcpu_put_debug_state_flags(vcpu);
 	kvm_arch_vcpu_put_fp(vcpu);
 	if (has_vhe())
diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index f361d8b91930..be52c5b15e21 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -20,6 +20,12 @@ struct pkvm_hyp_vcpu {
 
 	/* Backpointer to the host's (untrusted) vCPU instance. */
 	struct kvm_vcpu *host_vcpu;
+
+	/*
+	 * If this hyp vCPU is loaded, then this is a backpointer to the
+	 * per-cpu pointer tracking us. Otherwise, NULL if not loaded.
+	 */
+	struct pkvm_hyp_vcpu **loaded_hyp_vcpu;
 };
 
 /*
@@ -69,6 +75,7 @@ int __pkvm_teardown_vm(pkvm_handle_t handle);
 struct pkvm_hyp_vcpu *pkvm_load_hyp_vcpu(pkvm_handle_t handle,
 					 unsigned int vcpu_idx);
 void pkvm_put_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu);
+struct pkvm_hyp_vcpu *pkvm_get_loaded_hyp_vcpu(void);
 
 struct pkvm_hyp_vm *get_pkvm_hyp_vm(pkvm_handle_t handle);
 void put_pkvm_hyp_vm(struct pkvm_hyp_vm *hyp_vm);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 6aa0b13d86e5..95d78db315b3 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -141,16 +141,46 @@ static void sync_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
 		host_cpu_if->vgic_lr[i] = hyp_cpu_if->vgic_lr[i];
 }
 
+static void handle___pkvm_vcpu_load(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
+	DECLARE_REG(unsigned int, vcpu_idx, host_ctxt, 2);
+	DECLARE_REG(u64, hcr_el2, host_ctxt, 3);
+	struct pkvm_hyp_vcpu *hyp_vcpu;
+
+	if (!is_protected_kvm_enabled())
+		return;
+
+	hyp_vcpu = pkvm_load_hyp_vcpu(handle, vcpu_idx);
+	if (!hyp_vcpu)
+		return;
+
+	if (pkvm_hyp_vcpu_is_protected(hyp_vcpu)) {
+		/* Propagate WFx trapping flags */
+		hyp_vcpu->vcpu.arch.hcr_el2 &= ~(HCR_TWE | HCR_TWI);
+		hyp_vcpu->vcpu.arch.hcr_el2 |= hcr_el2 & (HCR_TWE | HCR_TWI);
+	}
+}
+
+static void handle___pkvm_vcpu_put(struct kvm_cpu_context *host_ctxt)
+{
+	struct pkvm_hyp_vcpu *hyp_vcpu;
+
+	if (!is_protected_kvm_enabled())
+		return;
+
+	hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
+	if (hyp_vcpu)
+		pkvm_put_hyp_vcpu(hyp_vcpu);
+}
+
 static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
 {
 	DECLARE_REG(struct kvm_vcpu *, host_vcpu, host_ctxt, 1);
 	int ret;
 
-	host_vcpu = kern_hyp_va(host_vcpu);
-
 	if (unlikely(is_protected_kvm_enabled())) {
-		struct pkvm_hyp_vcpu *hyp_vcpu;
-		struct kvm *host_kvm;
+		struct pkvm_hyp_vcpu *hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
 
 		/*
 		 * KVM (and pKVM) doesn't support SME guests for now, and
@@ -163,9 +193,6 @@ static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
 			goto out;
 		}
 
-		host_kvm = kern_hyp_va(host_vcpu->kvm);
-		hyp_vcpu = pkvm_load_hyp_vcpu(host_kvm->arch.pkvm.handle,
-					      host_vcpu->vcpu_idx);
 		if (!hyp_vcpu) {
 			ret = -EINVAL;
 			goto out;
@@ -176,12 +203,10 @@ static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
 		ret = __kvm_vcpu_run(&hyp_vcpu->vcpu);
 
 		sync_hyp_vcpu(hyp_vcpu);
-		pkvm_put_hyp_vcpu(hyp_vcpu);
 	} else {
 		/* The host is fully trusted, run its vCPU directly. */
-		ret = __kvm_vcpu_run(host_vcpu);
+		ret = __kvm_vcpu_run(kern_hyp_va(host_vcpu));
 	}
-
 out:
 	cpu_reg(host_ctxt, 1) =  ret;
 }
@@ -409,6 +434,8 @@ static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__pkvm_init_vm),
 	HANDLE_FUNC(__pkvm_init_vcpu),
 	HANDLE_FUNC(__pkvm_teardown_vm),
+	HANDLE_FUNC(__pkvm_vcpu_load),
+	HANDLE_FUNC(__pkvm_vcpu_put),
 };
 
 static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 4db88bedf8d5..d5c23449a64c 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -23,6 +23,12 @@ unsigned int kvm_arm_vmid_bits;
 
 unsigned int kvm_host_sve_max_vl;
 
+/*
+ * The currently loaded hyp vCPU for each physical CPU. Used only when
+ * protected KVM is enabled, but for both protected and non-protected VMs.
+ */
+static DEFINE_PER_CPU(struct pkvm_hyp_vcpu *, loaded_hyp_vcpu);
+
 /*
  * Set trap register values based on features in ID_AA64PFR0.
  */
@@ -306,15 +312,30 @@ struct pkvm_hyp_vcpu *pkvm_load_hyp_vcpu(pkvm_handle_t handle,
 	struct pkvm_hyp_vcpu *hyp_vcpu = NULL;
 	struct pkvm_hyp_vm *hyp_vm;
 
+	/* Cannot load a new vcpu without putting the old one first. */
+	if (__this_cpu_read(loaded_hyp_vcpu))
+		return NULL;
+
 	hyp_spin_lock(&vm_table_lock);
 	hyp_vm = get_vm_by_handle(handle);
 	if (!hyp_vm || hyp_vm->nr_vcpus <= vcpu_idx)
 		goto unlock;
 
 	hyp_vcpu = hyp_vm->vcpus[vcpu_idx];
+
+	/* Ensure vcpu isn't loaded on more than one cpu simultaneously. */
+	if (unlikely(hyp_vcpu->loaded_hyp_vcpu)) {
+		hyp_vcpu = NULL;
+		goto unlock;
+	}
+
+	hyp_vcpu->loaded_hyp_vcpu = this_cpu_ptr(&loaded_hyp_vcpu);
 	hyp_page_ref_inc(hyp_virt_to_page(hyp_vm));
 unlock:
 	hyp_spin_unlock(&vm_table_lock);
+
+	if (hyp_vcpu)
+		__this_cpu_write(loaded_hyp_vcpu, hyp_vcpu);
 	return hyp_vcpu;
 }
 
@@ -323,10 +344,18 @@ void pkvm_put_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
 	struct pkvm_hyp_vm *hyp_vm = pkvm_hyp_vcpu_to_hyp_vm(hyp_vcpu);
 
 	hyp_spin_lock(&vm_table_lock);
+	hyp_vcpu->loaded_hyp_vcpu = NULL;
+	__this_cpu_write(loaded_hyp_vcpu, NULL);
 	hyp_page_ref_dec(hyp_virt_to_page(hyp_vm));
 	hyp_spin_unlock(&vm_table_lock);
 }
 
+struct pkvm_hyp_vcpu *pkvm_get_loaded_hyp_vcpu(void)
+{
+	return __this_cpu_read(loaded_hyp_vcpu);
+
+}
+
 struct pkvm_hyp_vm *get_pkvm_hyp_vm(pkvm_handle_t handle)
 {
 	struct pkvm_hyp_vm *hyp_vm;
diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
index f267bc2486a1..c2ef41fff079 100644
--- a/arch/arm64/kvm/vgic/vgic-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-v3.c
@@ -734,7 +734,8 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
 
-	kvm_call_hyp(__vgic_v3_restore_vmcr_aprs, cpu_if);
+	if (likely(!is_protected_kvm_enabled()))
+		kvm_call_hyp(__vgic_v3_restore_vmcr_aprs, cpu_if);
 
 	if (has_vhe())
 		__vgic_v3_activate_traps(cpu_if);
@@ -746,7 +747,8 @@ void vgic_v3_put(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
 
-	kvm_call_hyp(__vgic_v3_save_vmcr_aprs, cpu_if);
+	if (likely(!is_protected_kvm_enabled()))
+		kvm_call_hyp(__vgic_v3_save_vmcr_aprs, cpu_if);
 	WARN_ON(vgic_v4_put(vcpu));
 
 	if (has_vhe())
-- 
2.47.0.338.g60cca15819-goog



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 10/18] KVM: arm64: Introduce __pkvm_host_share_guest()
  2024-12-03 10:37 [PATCH v2 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
                   ` (8 preceding siblings ...)
  2024-12-03 10:37 ` [PATCH v2 09/18] KVM: arm64: Introduce __pkvm_vcpu_{load,put}() Quentin Perret
@ 2024-12-03 10:37 ` Quentin Perret
  2024-12-10 13:58   ` Fuad Tabba
  2024-12-03 10:37 ` [PATCH v2 11/18] KVM: arm64: Introduce __pkvm_host_unshare_guest() Quentin Perret
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 50+ messages in thread
From: Quentin Perret @ 2024-12-03 10:37 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon
  Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
	kvmarm, linux-kernel

In preparation for handling guest stage-2 mappings at EL2, introduce a
new pKVM hypercall allowing to share pages with non-protected guests.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_asm.h              |  1 +
 arch/arm64/include/asm/kvm_host.h             |  3 +
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  1 +
 arch/arm64/kvm/hyp/include/nvhe/memory.h      |  2 +
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            | 34 +++++++++
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 70 +++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/pkvm.c                |  7 ++
 7 files changed, 118 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 89c0fac69551..449337f5b2a3 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -65,6 +65,7 @@ enum __kvm_host_smccc_func {
 	/* Hypercalls available after pKVM finalisation */
 	__KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp,
 	__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
+	__KVM_HOST_SMCCC_FUNC___pkvm_host_share_guest,
 	__KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
 	__KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
 	__KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e18e9244d17a..f75988e3515b 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -771,6 +771,9 @@ struct kvm_vcpu_arch {
 	/* Cache some mmu pages needed inside spinlock regions */
 	struct kvm_mmu_memory_cache mmu_page_cache;
 
+	/* Pages to be donated to pkvm/EL2 if it runs out */
+	struct kvm_hyp_memcache pkvm_memcache;
+
 	/* Virtual SError ESR to restore when HCR_EL2.VSE is set */
 	u64 vsesr_el2;
 
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 25038ac705d8..a7976e50f556 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -39,6 +39,7 @@ int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages);
 int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
 int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
 int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
+int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum kvm_pgtable_prot prot);
 
 bool addr_is_memory(phys_addr_t phys);
 int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index 08f3a0416d4c..457318215155 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -47,6 +47,8 @@ struct hyp_page {
 
 	/* Host (non-meta) state. Guarded by the host stage-2 lock. */
 	enum pkvm_page_state host_state : 8;
+
+	u32 host_share_guest_count;
 };
 
 extern u64 __hyp_vmemmap;
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 95d78db315b3..d659462fbf5d 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -211,6 +211,39 @@ static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
 	cpu_reg(host_ctxt, 1) =  ret;
 }
 
+static int pkvm_refill_memcache(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
+
+	return refill_memcache(&hyp_vcpu->vcpu.arch.pkvm_memcache,
+			       host_vcpu->arch.pkvm_memcache.nr_pages,
+			       &host_vcpu->arch.pkvm_memcache);
+}
+
+static void handle___pkvm_host_share_guest(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(u64, pfn, host_ctxt, 1);
+	DECLARE_REG(u64, gfn, host_ctxt, 2);
+	DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 3);
+	struct pkvm_hyp_vcpu *hyp_vcpu;
+	int ret = -EINVAL;
+
+	if (!is_protected_kvm_enabled())
+		goto out;
+
+	hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
+	if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
+		goto out;
+
+	ret = pkvm_refill_memcache(hyp_vcpu);
+	if (ret)
+		goto out;
+
+	ret = __pkvm_host_share_guest(pfn, gfn, hyp_vcpu, prot);
+out:
+	cpu_reg(host_ctxt, 1) =  ret;
+}
+
 static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt)
 {
 	DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
@@ -420,6 +453,7 @@ static const hcall_t host_hcall[] = {
 
 	HANDLE_FUNC(__pkvm_host_share_hyp),
 	HANDLE_FUNC(__pkvm_host_unshare_hyp),
+	HANDLE_FUNC(__pkvm_host_share_guest),
 	HANDLE_FUNC(__kvm_adjust_pc),
 	HANDLE_FUNC(__kvm_vcpu_run),
 	HANDLE_FUNC(__kvm_flush_vm_context),
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 1595081c4f6b..a69d7212b64c 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -861,6 +861,27 @@ static int hyp_complete_donation(u64 addr,
 	return pkvm_create_mappings_locked(start, end, prot);
 }
 
+static enum pkvm_page_state guest_get_page_state(kvm_pte_t pte, u64 addr)
+{
+	if (!kvm_pte_valid(pte))
+		return PKVM_NOPAGE;
+
+	return pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte));
+}
+
+static int __guest_check_page_state_range(struct pkvm_hyp_vcpu *vcpu, u64 addr,
+					  u64 size, enum pkvm_page_state state)
+{
+	struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
+	struct check_walk_data d = {
+		.desired	= state,
+		.get_page_state	= guest_get_page_state,
+	};
+
+	hyp_assert_lock_held(&vm->lock);
+	return check_page_state_range(&vm->pgt, addr, size, &d);
+}
+
 static int check_share(struct pkvm_mem_share *share)
 {
 	const struct pkvm_mem_transition *tx = &share->tx;
@@ -1343,3 +1364,52 @@ int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages)
 
 	return ret;
 }
+
+int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu,
+			    enum kvm_pgtable_prot prot)
+{
+	struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
+	u64 phys = hyp_pfn_to_phys(pfn);
+	u64 ipa = hyp_pfn_to_phys(gfn);
+	struct hyp_page *page;
+	int ret;
+
+	if (prot & ~KVM_PGTABLE_PROT_RWX)
+		return -EINVAL;
+
+	ret = range_is_allowed_memory(phys, phys + PAGE_SIZE);
+	if (ret)
+		return ret;
+
+	host_lock_component();
+	guest_lock_component(vm);
+
+	ret = __guest_check_page_state_range(vcpu, ipa, PAGE_SIZE, PKVM_NOPAGE);
+	if (ret)
+		goto unlock;
+
+	page = hyp_phys_to_page(phys);
+	switch (page->host_state) {
+	case PKVM_PAGE_OWNED:
+		WARN_ON(__host_set_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_SHARED_OWNED));
+		break;
+	case PKVM_PAGE_SHARED_OWNED:
+		/* Only host to np-guest multi-sharing is tolerated */
+		WARN_ON(!page->host_share_guest_count);
+		break;
+	default:
+		ret = -EPERM;
+		goto unlock;
+	}
+
+	WARN_ON(kvm_pgtable_stage2_map(&vm->pgt, ipa, PAGE_SIZE, phys,
+				       pkvm_mkstate(prot, PKVM_PAGE_SHARED_BORROWED),
+				       &vcpu->vcpu.arch.pkvm_memcache, 0));
+	page->host_share_guest_count++;
+
+unlock:
+	guest_unlock_component(vm);
+	host_unlock_component();
+
+	return ret;
+}
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index d5c23449a64c..d6c61a5e7b6e 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -795,6 +795,13 @@ int __pkvm_teardown_vm(pkvm_handle_t handle)
 	/* Push the metadata pages to the teardown memcache */
 	for (idx = 0; idx < hyp_vm->nr_vcpus; ++idx) {
 		struct pkvm_hyp_vcpu *hyp_vcpu = hyp_vm->vcpus[idx];
+		struct kvm_hyp_memcache *vcpu_mc = &hyp_vcpu->vcpu.arch.pkvm_memcache;
+
+		while (vcpu_mc->nr_pages) {
+			void *addr = pop_hyp_memcache(vcpu_mc, hyp_phys_to_virt);
+			push_hyp_memcache(mc, addr, hyp_virt_to_phys);
+			unmap_donated_memory_noclear(addr, PAGE_SIZE);
+		}
 
 		teardown_donated_memory(mc, hyp_vcpu, sizeof(*hyp_vcpu));
 	}
-- 
2.47.0.338.g60cca15819-goog



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 11/18] KVM: arm64: Introduce __pkvm_host_unshare_guest()
  2024-12-03 10:37 [PATCH v2 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
                   ` (9 preceding siblings ...)
  2024-12-03 10:37 ` [PATCH v2 10/18] KVM: arm64: Introduce __pkvm_host_share_guest() Quentin Perret
@ 2024-12-03 10:37 ` Quentin Perret
  2024-12-10 14:41   ` Fuad Tabba
  2024-12-03 10:37 ` [PATCH v2 12/18] KVM: arm64: Introduce __pkvm_host_relax_guest_perms() Quentin Perret
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 50+ messages in thread
From: Quentin Perret @ 2024-12-03 10:37 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon
  Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
	kvmarm, linux-kernel

In preparation for letting the host unmap pages from non-protected
guests, introduce a new hypercall implementing the host-unshare-guest
transition.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_asm.h              |  1 +
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  1 +
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h        |  5 ++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            | 24 +++++++
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 67 +++++++++++++++++++
 5 files changed, 98 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 449337f5b2a3..0b6c4d325134 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -66,6 +66,7 @@ enum __kvm_host_smccc_func {
 	__KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp,
 	__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
 	__KVM_HOST_SMCCC_FUNC___pkvm_host_share_guest,
+	__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_guest,
 	__KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
 	__KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
 	__KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index a7976e50f556..e528a42ed60e 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -40,6 +40,7 @@ int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
 int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
 int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
 int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum kvm_pgtable_prot prot);
+int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
 
 bool addr_is_memory(phys_addr_t phys);
 int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index be52c5b15e21..5dfc9ece9aa5 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -64,6 +64,11 @@ static inline bool pkvm_hyp_vcpu_is_protected(struct pkvm_hyp_vcpu *hyp_vcpu)
 	return vcpu_is_protected(&hyp_vcpu->vcpu);
 }
 
+static inline bool pkvm_hyp_vm_is_protected(struct pkvm_hyp_vm *hyp_vm)
+{
+	return kvm_vm_is_protected(&hyp_vm->kvm);
+}
+
 void pkvm_hyp_vm_table_init(void *tbl);
 
 int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index d659462fbf5d..04a9053ae1d5 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -244,6 +244,29 @@ static void handle___pkvm_host_share_guest(struct kvm_cpu_context *host_ctxt)
 	cpu_reg(host_ctxt, 1) =  ret;
 }
 
+static void handle___pkvm_host_unshare_guest(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
+	DECLARE_REG(u64, gfn, host_ctxt, 2);
+	struct pkvm_hyp_vm *hyp_vm;
+	int ret = -EINVAL;
+
+	if (!is_protected_kvm_enabled())
+		goto out;
+
+	hyp_vm = get_pkvm_hyp_vm(handle);
+	if (!hyp_vm)
+		goto out;
+	if (pkvm_hyp_vm_is_protected(hyp_vm))
+		goto put_hyp_vm;
+
+	ret = __pkvm_host_unshare_guest(gfn, hyp_vm);
+put_hyp_vm:
+	put_pkvm_hyp_vm(hyp_vm);
+out:
+	cpu_reg(host_ctxt, 1) =  ret;
+}
+
 static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt)
 {
 	DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
@@ -454,6 +477,7 @@ static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__pkvm_host_share_hyp),
 	HANDLE_FUNC(__pkvm_host_unshare_hyp),
 	HANDLE_FUNC(__pkvm_host_share_guest),
+	HANDLE_FUNC(__pkvm_host_unshare_guest),
 	HANDLE_FUNC(__kvm_adjust_pc),
 	HANDLE_FUNC(__kvm_vcpu_run),
 	HANDLE_FUNC(__kvm_flush_vm_context),
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index a69d7212b64c..aa27a3e42e5e 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1413,3 +1413,70 @@ int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu,
 
 	return ret;
 }
+
+static int __check_host_unshare_guest(struct pkvm_hyp_vm *vm, u64 *__phys, u64 ipa)
+{
+	enum pkvm_page_state state;
+	struct hyp_page *page;
+	kvm_pte_t pte;
+	u64 phys;
+	s8 level;
+	int ret;
+
+	ret = kvm_pgtable_get_leaf(&vm->pgt, ipa, &pte, &level);
+	if (ret)
+		return ret;
+	if (level != KVM_PGTABLE_LAST_LEVEL)
+		return -E2BIG;
+	if (!kvm_pte_valid(pte))
+		return -ENOENT;
+
+	state = guest_get_page_state(pte, ipa);
+	if (state != PKVM_PAGE_SHARED_BORROWED)
+		return -EPERM;
+
+	phys = kvm_pte_to_phys(pte);
+	ret = range_is_allowed_memory(phys, phys + PAGE_SIZE);
+	if (WARN_ON(ret))
+		return ret;
+
+	page = hyp_phys_to_page(phys);
+	if (page->host_state != PKVM_PAGE_SHARED_OWNED)
+		return -EPERM;
+	if (WARN_ON(!page->host_share_guest_count))
+		return -EINVAL;
+
+	*__phys = phys;
+
+	return 0;
+}
+
+int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm)
+{
+	u64 ipa = hyp_pfn_to_phys(gfn);
+	struct hyp_page *page;
+	u64 phys;
+	int ret;
+
+	host_lock_component();
+	guest_lock_component(hyp_vm);
+
+	ret = __check_host_unshare_guest(hyp_vm, &phys, ipa);
+	if (ret)
+		goto unlock;
+
+	ret = kvm_pgtable_stage2_unmap(&hyp_vm->pgt, ipa, PAGE_SIZE);
+	if (ret)
+		goto unlock;
+
+	page = hyp_phys_to_page(phys);
+	page->host_share_guest_count--;
+	if (!page->host_share_guest_count)
+		WARN_ON(__host_set_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_OWNED));
+
+unlock:
+	guest_unlock_component(hyp_vm);
+	host_unlock_component();
+
+	return ret;
+}
-- 
2.47.0.338.g60cca15819-goog



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 12/18] KVM: arm64: Introduce __pkvm_host_relax_guest_perms()
  2024-12-03 10:37 [PATCH v2 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
                   ` (10 preceding siblings ...)
  2024-12-03 10:37 ` [PATCH v2 11/18] KVM: arm64: Introduce __pkvm_host_unshare_guest() Quentin Perret
@ 2024-12-03 10:37 ` Quentin Perret
  2024-12-10 14:56   ` Fuad Tabba
  2024-12-03 10:37 ` [PATCH v2 13/18] KVM: arm64: Introduce __pkvm_host_wrprotect_guest() Quentin Perret
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 50+ messages in thread
From: Quentin Perret @ 2024-12-03 10:37 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon
  Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
	kvmarm, linux-kernel

Introduce a new hypercall allowing the host to relax the stage-2
permissions of mappings in a non-protected guest page-table. It will be
used later once we start allowing RO memslots and dirty logging.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_asm.h              |  1 +
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  1 +
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            | 20 ++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 23 +++++++++++++++++++
 4 files changed, 45 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 0b6c4d325134..5d51933e44fb 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -67,6 +67,7 @@ enum __kvm_host_smccc_func {
 	__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
 	__KVM_HOST_SMCCC_FUNC___pkvm_host_share_guest,
 	__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_guest,
+	__KVM_HOST_SMCCC_FUNC___pkvm_host_relax_guest_perms,
 	__KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
 	__KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
 	__KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index e528a42ed60e..db0dd83c2457 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -41,6 +41,7 @@ int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
 int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
 int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum kvm_pgtable_prot prot);
 int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
+int __pkvm_host_relax_guest_perms(u64 gfn, enum kvm_pgtable_prot prot, struct pkvm_hyp_vcpu *vcpu);
 
 bool addr_is_memory(phys_addr_t phys);
 int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 04a9053ae1d5..60dd56bbd743 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -267,6 +267,25 @@ static void handle___pkvm_host_unshare_guest(struct kvm_cpu_context *host_ctxt)
 	cpu_reg(host_ctxt, 1) =  ret;
 }
 
+static void handle___pkvm_host_relax_guest_perms(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(u64, gfn, host_ctxt, 1);
+	DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 2);
+	struct pkvm_hyp_vcpu *hyp_vcpu;
+	int ret = -EINVAL;
+
+	if (!is_protected_kvm_enabled())
+		goto out;
+
+	hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
+	if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
+		goto out;
+
+	ret = __pkvm_host_relax_guest_perms(gfn, prot, hyp_vcpu);
+out:
+	cpu_reg(host_ctxt, 1) = ret;
+}
+
 static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt)
 {
 	DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
@@ -478,6 +497,7 @@ static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__pkvm_host_unshare_hyp),
 	HANDLE_FUNC(__pkvm_host_share_guest),
 	HANDLE_FUNC(__pkvm_host_unshare_guest),
+	HANDLE_FUNC(__pkvm_host_relax_guest_perms),
 	HANDLE_FUNC(__kvm_adjust_pc),
 	HANDLE_FUNC(__kvm_vcpu_run),
 	HANDLE_FUNC(__kvm_flush_vm_context),
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index aa27a3e42e5e..d4b28e93e790 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1480,3 +1480,26 @@ int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm)
 
 	return ret;
 }
+
+int __pkvm_host_relax_guest_perms(u64 gfn, enum kvm_pgtable_prot prot, struct pkvm_hyp_vcpu *vcpu)
+{
+	struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
+	u64 ipa = hyp_pfn_to_phys(gfn);
+	u64 phys;
+	int ret;
+
+	if ((prot & KVM_PGTABLE_PROT_RWX) != prot)
+		return -EPERM;
+
+	host_lock_component();
+	guest_lock_component(vm);
+
+	ret = __check_host_unshare_guest(vm, &phys, ipa);
+	if (!ret)
+		ret = kvm_pgtable_stage2_relax_perms(&vm->pgt, ipa, prot, 0);
+
+	guest_unlock_component(vm);
+	host_unlock_component();
+
+	return ret;
+}
-- 
2.47.0.338.g60cca15819-goog



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 13/18] KVM: arm64: Introduce __pkvm_host_wrprotect_guest()
  2024-12-03 10:37 [PATCH v2 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
                   ` (11 preceding siblings ...)
  2024-12-03 10:37 ` [PATCH v2 12/18] KVM: arm64: Introduce __pkvm_host_relax_guest_perms() Quentin Perret
@ 2024-12-03 10:37 ` Quentin Perret
  2024-12-10 15:06   ` Fuad Tabba
  2024-12-03 10:37 ` [PATCH v2 14/18] KVM: arm64: Introduce __pkvm_host_test_clear_young_guest() Quentin Perret
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 50+ messages in thread
From: Quentin Perret @ 2024-12-03 10:37 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon
  Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
	kvmarm, linux-kernel

Introduce a new hypercall to remove the write permission from a
non-protected guest stage-2 mapping. This will be used for e.g. enabling
dirty logging.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_asm.h              |  1 +
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  1 +
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            | 24 +++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 19 +++++++++++++++
 4 files changed, 45 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 5d51933e44fb..4d7d20ea03df 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -68,6 +68,7 @@ enum __kvm_host_smccc_func {
 	__KVM_HOST_SMCCC_FUNC___pkvm_host_share_guest,
 	__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_guest,
 	__KVM_HOST_SMCCC_FUNC___pkvm_host_relax_guest_perms,
+	__KVM_HOST_SMCCC_FUNC___pkvm_host_wrprotect_guest,
 	__KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
 	__KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
 	__KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index db0dd83c2457..8658b5932473 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -42,6 +42,7 @@ int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
 int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum kvm_pgtable_prot prot);
 int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
 int __pkvm_host_relax_guest_perms(u64 gfn, enum kvm_pgtable_prot prot, struct pkvm_hyp_vcpu *vcpu);
+int __pkvm_host_wrprotect_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
 
 bool addr_is_memory(phys_addr_t phys);
 int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 60dd56bbd743..3feaf2119e51 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -286,6 +286,29 @@ static void handle___pkvm_host_relax_guest_perms(struct kvm_cpu_context *host_ct
 	cpu_reg(host_ctxt, 1) = ret;
 }
 
+static void handle___pkvm_host_wrprotect_guest(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
+	DECLARE_REG(u64, gfn, host_ctxt, 2);
+	struct pkvm_hyp_vm *hyp_vm;
+	int ret = -EINVAL;
+
+	if (!is_protected_kvm_enabled())
+		goto out;
+
+	hyp_vm = get_pkvm_hyp_vm(handle);
+	if (!hyp_vm)
+		goto out;
+	if (pkvm_hyp_vm_is_protected(hyp_vm))
+		goto put_hyp_vm;
+
+	ret = __pkvm_host_wrprotect_guest(gfn, hyp_vm);
+put_hyp_vm:
+	put_pkvm_hyp_vm(hyp_vm);
+out:
+	cpu_reg(host_ctxt, 1) = ret;
+}
+
 static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt)
 {
 	DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
@@ -498,6 +521,7 @@ static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__pkvm_host_share_guest),
 	HANDLE_FUNC(__pkvm_host_unshare_guest),
 	HANDLE_FUNC(__pkvm_host_relax_guest_perms),
+	HANDLE_FUNC(__pkvm_host_wrprotect_guest),
 	HANDLE_FUNC(__kvm_adjust_pc),
 	HANDLE_FUNC(__kvm_vcpu_run),
 	HANDLE_FUNC(__kvm_flush_vm_context),
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index d4b28e93e790..89312d7cde2a 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1503,3 +1503,22 @@ int __pkvm_host_relax_guest_perms(u64 gfn, enum kvm_pgtable_prot prot, struct pk
 
 	return ret;
 }
+
+int __pkvm_host_wrprotect_guest(u64 gfn, struct pkvm_hyp_vm *vm)
+{
+	u64 ipa = hyp_pfn_to_phys(gfn);
+	u64 phys;
+	int ret;
+
+	host_lock_component();
+	guest_lock_component(vm);
+
+	ret = __check_host_unshare_guest(vm, &phys, ipa);
+	if (!ret)
+		ret = kvm_pgtable_stage2_wrprotect(&vm->pgt, ipa, PAGE_SIZE);
+
+	guest_unlock_component(vm);
+	host_unlock_component();
+
+	return ret;
+}
-- 
2.47.0.338.g60cca15819-goog



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 14/18] KVM: arm64: Introduce __pkvm_host_test_clear_young_guest()
  2024-12-03 10:37 [PATCH v2 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
                   ` (12 preceding siblings ...)
  2024-12-03 10:37 ` [PATCH v2 13/18] KVM: arm64: Introduce __pkvm_host_wrprotect_guest() Quentin Perret
@ 2024-12-03 10:37 ` Quentin Perret
  2024-12-10 15:11   ` Fuad Tabba
  2024-12-03 10:37 ` [PATCH v2 15/18] KVM: arm64: Introduce __pkvm_host_mkyoung_guest() Quentin Perret
                   ` (3 subsequent siblings)
  17 siblings, 1 reply; 50+ messages in thread
From: Quentin Perret @ 2024-12-03 10:37 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon
  Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
	kvmarm, linux-kernel

Plumb the kvm_stage2_test_clear_young() callback into pKVM for
non-protected guest. It will be later be called from MMU notifiers.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_asm.h              |  1 +
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  1 +
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            | 25 +++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 19 ++++++++++++++
 4 files changed, 46 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 4d7d20ea03df..cb676017d591 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -69,6 +69,7 @@ enum __kvm_host_smccc_func {
 	__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_guest,
 	__KVM_HOST_SMCCC_FUNC___pkvm_host_relax_guest_perms,
 	__KVM_HOST_SMCCC_FUNC___pkvm_host_wrprotect_guest,
+	__KVM_HOST_SMCCC_FUNC___pkvm_host_test_clear_young_guest,
 	__KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
 	__KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
 	__KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 8658b5932473..554ce31882e6 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -43,6 +43,7 @@ int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum k
 int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
 int __pkvm_host_relax_guest_perms(u64 gfn, enum kvm_pgtable_prot prot, struct pkvm_hyp_vcpu *vcpu);
 int __pkvm_host_wrprotect_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
+int __pkvm_host_test_clear_young_guest(u64 gfn, bool mkold, struct pkvm_hyp_vm *vm);
 
 bool addr_is_memory(phys_addr_t phys);
 int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 3feaf2119e51..67cb6e284180 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -309,6 +309,30 @@ static void handle___pkvm_host_wrprotect_guest(struct kvm_cpu_context *host_ctxt
 	cpu_reg(host_ctxt, 1) = ret;
 }
 
+static void handle___pkvm_host_test_clear_young_guest(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
+	DECLARE_REG(u64, gfn, host_ctxt, 2);
+	DECLARE_REG(bool, mkold, host_ctxt, 3);
+	struct pkvm_hyp_vm *hyp_vm;
+	int ret = -EINVAL;
+
+	if (!is_protected_kvm_enabled())
+		goto out;
+
+	hyp_vm = get_pkvm_hyp_vm(handle);
+	if (!hyp_vm)
+		goto out;
+	if (pkvm_hyp_vm_is_protected(hyp_vm))
+		goto put_hyp_vm;
+
+	ret = __pkvm_host_test_clear_young_guest(gfn, mkold, hyp_vm);
+put_hyp_vm:
+	put_pkvm_hyp_vm(hyp_vm);
+out:
+	cpu_reg(host_ctxt, 1) = ret;
+}
+
 static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt)
 {
 	DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
@@ -522,6 +546,7 @@ static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__pkvm_host_unshare_guest),
 	HANDLE_FUNC(__pkvm_host_relax_guest_perms),
 	HANDLE_FUNC(__pkvm_host_wrprotect_guest),
+	HANDLE_FUNC(__pkvm_host_test_clear_young_guest),
 	HANDLE_FUNC(__kvm_adjust_pc),
 	HANDLE_FUNC(__kvm_vcpu_run),
 	HANDLE_FUNC(__kvm_flush_vm_context),
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 89312d7cde2a..0e064a7ed7c4 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1522,3 +1522,22 @@ int __pkvm_host_wrprotect_guest(u64 gfn, struct pkvm_hyp_vm *vm)
 
 	return ret;
 }
+
+int __pkvm_host_test_clear_young_guest(u64 gfn, bool mkold, struct pkvm_hyp_vm *vm)
+{
+	u64 ipa = hyp_pfn_to_phys(gfn);
+	u64 phys;
+	int ret;
+
+	host_lock_component();
+	guest_lock_component(vm);
+
+	ret = __check_host_unshare_guest(vm, &phys, ipa);
+	if (!ret)
+		ret = kvm_pgtable_stage2_test_clear_young(&vm->pgt, ipa, PAGE_SIZE, mkold);
+
+	guest_unlock_component(vm);
+	host_unlock_component();
+
+	return ret;
+}
-- 
2.47.0.338.g60cca15819-goog



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 15/18] KVM: arm64: Introduce __pkvm_host_mkyoung_guest()
  2024-12-03 10:37 [PATCH v2 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
                   ` (13 preceding siblings ...)
  2024-12-03 10:37 ` [PATCH v2 14/18] KVM: arm64: Introduce __pkvm_host_test_clear_young_guest() Quentin Perret
@ 2024-12-03 10:37 ` Quentin Perret
  2024-12-10 15:14   ` Fuad Tabba
  2024-12-03 10:37 ` [PATCH v2 16/18] KVM: arm64: Introduce __pkvm_tlb_flush_vmid() Quentin Perret
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 50+ messages in thread
From: Quentin Perret @ 2024-12-03 10:37 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon
  Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
	kvmarm, linux-kernel

Plumb the kvm_pgtable_stage2_mkyoung() callback into pKVM for
non-protected guests. It will be called later from the fault handling
path.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_asm.h              |  1 +
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  1 +
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            | 19 ++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 20 +++++++++++++++++++
 4 files changed, 41 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index cb676017d591..6178e12a0dbc 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -70,6 +70,7 @@ enum __kvm_host_smccc_func {
 	__KVM_HOST_SMCCC_FUNC___pkvm_host_relax_guest_perms,
 	__KVM_HOST_SMCCC_FUNC___pkvm_host_wrprotect_guest,
 	__KVM_HOST_SMCCC_FUNC___pkvm_host_test_clear_young_guest,
+	__KVM_HOST_SMCCC_FUNC___pkvm_host_mkyoung_guest,
 	__KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
 	__KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
 	__KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 554ce31882e6..3ae0c3ecff48 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -44,6 +44,7 @@ int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
 int __pkvm_host_relax_guest_perms(u64 gfn, enum kvm_pgtable_prot prot, struct pkvm_hyp_vcpu *vcpu);
 int __pkvm_host_wrprotect_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
 int __pkvm_host_test_clear_young_guest(u64 gfn, bool mkold, struct pkvm_hyp_vm *vm);
+int __pkvm_host_mkyoung_guest(u64 gfn, struct pkvm_hyp_vcpu *vcpu);
 
 bool addr_is_memory(phys_addr_t phys);
 int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 67cb6e284180..de0012a75827 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -333,6 +333,24 @@ static void handle___pkvm_host_test_clear_young_guest(struct kvm_cpu_context *ho
 	cpu_reg(host_ctxt, 1) = ret;
 }
 
+static void handle___pkvm_host_mkyoung_guest(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(u64, gfn, host_ctxt, 1);
+	struct pkvm_hyp_vcpu *hyp_vcpu;
+	int ret = -EINVAL;
+
+	if (!is_protected_kvm_enabled())
+		goto out;
+
+	hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
+	if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
+		goto out;
+
+	ret = __pkvm_host_mkyoung_guest(gfn, hyp_vcpu);
+out:
+	cpu_reg(host_ctxt, 1) =  ret;
+}
+
 static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt)
 {
 	DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
@@ -547,6 +565,7 @@ static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__pkvm_host_relax_guest_perms),
 	HANDLE_FUNC(__pkvm_host_wrprotect_guest),
 	HANDLE_FUNC(__pkvm_host_test_clear_young_guest),
+	HANDLE_FUNC(__pkvm_host_mkyoung_guest),
 	HANDLE_FUNC(__kvm_adjust_pc),
 	HANDLE_FUNC(__kvm_vcpu_run),
 	HANDLE_FUNC(__kvm_flush_vm_context),
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 0e064a7ed7c4..7605bd7f80b5 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1541,3 +1541,23 @@ int __pkvm_host_test_clear_young_guest(u64 gfn, bool mkold, struct pkvm_hyp_vm *
 
 	return ret;
 }
+
+int __pkvm_host_mkyoung_guest(u64 gfn, struct pkvm_hyp_vcpu *vcpu)
+{
+	struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
+	u64 ipa = hyp_pfn_to_phys(gfn);
+	u64 phys;
+	int ret;
+
+	host_lock_component();
+	guest_lock_component(vm);
+
+	ret = __check_host_unshare_guest(vm, &phys, ipa);
+	if (!ret)
+		kvm_pgtable_stage2_mkyoung(&vm->pgt, ipa, 0);
+
+	guest_unlock_component(vm);
+	host_unlock_component();
+
+	return ret;
+}
-- 
2.47.0.338.g60cca15819-goog



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 16/18] KVM: arm64: Introduce __pkvm_tlb_flush_vmid()
  2024-12-03 10:37 [PATCH v2 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
                   ` (14 preceding siblings ...)
  2024-12-03 10:37 ` [PATCH v2 15/18] KVM: arm64: Introduce __pkvm_host_mkyoung_guest() Quentin Perret
@ 2024-12-03 10:37 ` Quentin Perret
  2024-12-10 15:23   ` Fuad Tabba
  2024-12-03 10:37 ` [PATCH v2 17/18] KVM: arm64: Introduce the EL1 pKVM MMU Quentin Perret
  2024-12-03 10:37 ` [PATCH v2 18/18] KVM: arm64: Plumb the pKVM MMU in KVM Quentin Perret
  17 siblings, 1 reply; 50+ messages in thread
From: Quentin Perret @ 2024-12-03 10:37 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon
  Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
	kvmarm, linux-kernel

Introduce a new hypercall to flush the TLBs of non-protected guests. The
host kernel will be responsible for issuing this hypercall after changing
stage-2 permissions using the __pkvm_host_relax_guest_perms() or
__pkvm_host_wrprotect_guest() paths. This is left under the host's
responsibility for performance reasons.

Note however that the TLB maintenance for all *unmap* operations still
remains entirely under the hypervisor's responsibility for security
reasons -- an unmapped page may be donated to another entity, so a stale
TLB entry could be used to leak private data.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_asm.h   |  1 +
 arch/arm64/kvm/hyp/nvhe/hyp-main.c | 17 +++++++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 6178e12a0dbc..df6237d0459c 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -87,6 +87,7 @@ enum __kvm_host_smccc_func {
 	__KVM_HOST_SMCCC_FUNC___pkvm_teardown_vm,
 	__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_load,
 	__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_put,
+	__KVM_HOST_SMCCC_FUNC___pkvm_tlb_flush_vmid,
 };
 
 #define DECLARE_KVM_VHE_SYM(sym)	extern char sym[]
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index de0012a75827..219d7fb850ec 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -398,6 +398,22 @@ static void handle___kvm_tlb_flush_vmid(struct kvm_cpu_context *host_ctxt)
 	__kvm_tlb_flush_vmid(kern_hyp_va(mmu));
 }
 
+static void handle___pkvm_tlb_flush_vmid(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
+	struct pkvm_hyp_vm *hyp_vm;
+
+	if (!is_protected_kvm_enabled())
+		return;
+
+	hyp_vm = get_pkvm_hyp_vm(handle);
+	if (!hyp_vm)
+		return;
+
+	__kvm_tlb_flush_vmid(&hyp_vm->kvm.arch.mmu);
+	put_pkvm_hyp_vm(hyp_vm);
+}
+
 static void handle___kvm_flush_cpu_context(struct kvm_cpu_context *host_ctxt)
 {
 	DECLARE_REG(struct kvm_s2_mmu *, mmu, host_ctxt, 1);
@@ -582,6 +598,7 @@ static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__pkvm_teardown_vm),
 	HANDLE_FUNC(__pkvm_vcpu_load),
 	HANDLE_FUNC(__pkvm_vcpu_put),
+	HANDLE_FUNC(__pkvm_tlb_flush_vmid),
 };
 
 static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
-- 
2.47.0.338.g60cca15819-goog



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 17/18] KVM: arm64: Introduce the EL1 pKVM MMU
  2024-12-03 10:37 [PATCH v2 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
                   ` (15 preceding siblings ...)
  2024-12-03 10:37 ` [PATCH v2 16/18] KVM: arm64: Introduce __pkvm_tlb_flush_vmid() Quentin Perret
@ 2024-12-03 10:37 ` Quentin Perret
  2024-12-12 11:35   ` Marc Zyngier
  2024-12-03 10:37 ` [PATCH v2 18/18] KVM: arm64: Plumb the pKVM MMU in KVM Quentin Perret
  17 siblings, 1 reply; 50+ messages in thread
From: Quentin Perret @ 2024-12-03 10:37 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon
  Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
	kvmarm, linux-kernel

Introduce a set of helper functions allowing to manipulate the pKVM
guest stage-2 page-tables from EL1 using pKVM's HVC interface.

Each helper has an exact one-to-one correspondance with the traditional
kvm_pgtable_stage2_*() functions from pgtable.c, with a strictly
matching prototype. This will ease plumbing later on in mmu.c.

These callbacks track the gfn->pfn mappings in a simple rb_tree indexed
by IPA in lieu of a page-table. This rb-tree is kept in sync with pKVM's
state and is protected by a new rwlock -- the existing mmu_lock
protection does not suffice in the map() path where the tree must be
modified while user_mem_abort() only acquires a read_lock.

Signed-off-by: Quentin Perret <qperret@google.com>
---

The embedded union inside struct kvm_pgtable is arguably a bit horrible
currently... I considered making the pgt argument to all kvm_pgtable_*()
functions an opaque void * ptr, and moving the definition of
struct kvm_pgtable to pgtable.c and the pkvm version into pkvm.c. Given
that the allocation of that data-structure is done by the caller, that
means we'd need to expose kvm_pgtable_get_pgd_size() or something that
each MMU (pgtable.c and pkvm.c) would have to implement and things like
that. But that felt like a bigger surgery, so I went with the simpler
option. Thoughts welcome :-)

Similarly, happy to drop the mappings_lock if we want to teach
user_mem_abort() about taking a write lock on the mmu_lock in the pKVM
case, but again this implementation is the least invasive into normal
KVM so that felt like a reasonable starting point.
---
 arch/arm64/include/asm/kvm_host.h    |   1 +
 arch/arm64/include/asm/kvm_pgtable.h |  27 ++--
 arch/arm64/include/asm/kvm_pkvm.h    |  28 ++++
 arch/arm64/kvm/pkvm.c                | 195 +++++++++++++++++++++++++++
 4 files changed, 242 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index f75988e3515b..05936b57a3a4 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -85,6 +85,7 @@ void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu);
 struct kvm_hyp_memcache {
 	phys_addr_t head;
 	unsigned long nr_pages;
+	struct pkvm_mapping *mapping; /* only used from EL1 */
 };
 
 static inline void push_hyp_memcache(struct kvm_hyp_memcache *mc,
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 04418b5e3004..d24d18874015 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -412,15 +412,24 @@ static inline bool kvm_pgtable_walk_lock_held(void)
  *			be used instead of block mappings.
  */
 struct kvm_pgtable {
-	u32					ia_bits;
-	s8					start_level;
-	kvm_pteref_t				pgd;
-	struct kvm_pgtable_mm_ops		*mm_ops;
-
-	/* Stage-2 only */
-	struct kvm_s2_mmu			*mmu;
-	enum kvm_pgtable_stage2_flags		flags;
-	kvm_pgtable_force_pte_cb_t		force_pte_cb;
+	union {
+		struct {
+			u32					ia_bits;
+			s8					start_level;
+			kvm_pteref_t				pgd;
+			struct kvm_pgtable_mm_ops		*mm_ops;
+
+			/* Stage-2 only */
+			struct kvm_s2_mmu			*mmu;
+			enum kvm_pgtable_stage2_flags		flags;
+			kvm_pgtable_force_pte_cb_t		force_pte_cb;
+		};
+		struct {
+			struct kvm				*kvm;
+			struct rb_root				mappings;
+			rwlock_t				mappings_lock;
+		} pkvm;
+	};
 };
 
 /**
diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index cd56acd9a842..84211d5daf87 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -11,6 +11,12 @@
 #include <linux/scatterlist.h>
 #include <asm/kvm_pgtable.h>
 
+struct pkvm_mapping {
+	u64 gfn;
+	u64 pfn;
+	struct rb_node node;
+};
+
 /* Maximum number of VMs that can co-exist under pKVM. */
 #define KVM_MAX_PVMS 255
 
@@ -137,4 +143,26 @@ static inline size_t pkvm_host_sve_state_size(void)
 			SVE_SIG_REGS_SIZE(sve_vq_from_vl(kvm_host_sve_max_vl)));
 }
 
+static inline pkvm_handle_t pkvm_pgt_to_handle(struct kvm_pgtable *pgt)
+{
+	return pgt->pkvm.kvm->arch.pkvm.handle;
+}
+
+int pkvm_pgtable_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu, struct kvm_pgtable_mm_ops *mm_ops);
+void pkvm_pgtable_destroy(struct kvm_pgtable *pgt);
+int pkvm_pgtable_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
+			   u64 phys, enum kvm_pgtable_prot prot,
+			   void *mc, enum kvm_pgtable_walk_flags flags);
+int pkvm_pgtable_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
+int pkvm_pgtable_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size);
+int pkvm_pgtable_flush(struct kvm_pgtable *pgt, u64 addr, u64 size);
+bool pkvm_pgtable_test_clear_young(struct kvm_pgtable *pgt, u64 addr, u64 size, bool mkold);
+int pkvm_pgtable_relax_perms(struct kvm_pgtable *pgt, u64 addr, enum kvm_pgtable_prot prot,
+			     enum kvm_pgtable_walk_flags flags);
+void pkvm_pgtable_mkyoung(struct kvm_pgtable *pgt, u64 addr, enum kvm_pgtable_walk_flags flags);
+int pkvm_pgtable_split(struct kvm_pgtable *pgt, u64 addr, u64 size, struct kvm_mmu_memory_cache *mc);
+void pkvm_pgtable_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, s8 level);
+kvm_pte_t *pkvm_pgtable_create_unlinked(struct kvm_pgtable *pgt, u64 phys, s8 level,
+					enum kvm_pgtable_prot prot, void *mc, bool force_pte);
+
 #endif	/* __ARM64_KVM_PKVM_H__ */
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 85117ea8f351..9c648a510671 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -7,6 +7,7 @@
 #include <linux/init.h>
 #include <linux/kmemleak.h>
 #include <linux/kvm_host.h>
+#include <asm/kvm_mmu.h>
 #include <linux/memblock.h>
 #include <linux/mutex.h>
 #include <linux/sort.h>
@@ -268,3 +269,197 @@ static int __init finalize_pkvm(void)
 	return ret;
 }
 device_initcall_sync(finalize_pkvm);
+
+static int cmp_mappings(struct rb_node *node, const struct rb_node *parent)
+{
+	struct pkvm_mapping *a = rb_entry(node, struct pkvm_mapping, node);
+	struct pkvm_mapping *b = rb_entry(parent, struct pkvm_mapping, node);
+
+	if (a->gfn < b->gfn)
+		return -1;
+	if (a->gfn > b->gfn)
+		return 1;
+	return 0;
+}
+
+static struct rb_node *find_first_mapping_node(struct rb_root *root, u64 gfn)
+{
+	struct rb_node *node = root->rb_node, *prev = NULL;
+	struct pkvm_mapping *mapping;
+
+	while (node) {
+		mapping = rb_entry(node, struct pkvm_mapping, node);
+		if (mapping->gfn == gfn)
+			return node;
+		prev = node;
+		node = (gfn < mapping->gfn) ? node->rb_left : node->rb_right;
+	}
+
+	return prev;
+}
+
+#define for_each_mapping_in_range(pgt, start_ipa, end_ipa, mapping, tmp)				\
+	for (tmp = find_first_mapping_node(&pgt->pkvm.mappings, ((start_ipa) >> PAGE_SHIFT));		\
+	     tmp && ({ mapping = rb_entry(tmp, struct pkvm_mapping, node); tmp = rb_next(tmp); 1; });)	\
+		if (mapping->gfn < ((start_ipa) >> PAGE_SHIFT))						\
+			continue;									\
+		else if (mapping->gfn >= ((end_ipa) >> PAGE_SHIFT))					\
+			break;										\
+		else
+
+int pkvm_pgtable_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu, struct kvm_pgtable_mm_ops *mm_ops)
+{
+	pgt->pkvm.kvm		= kvm_s2_mmu_to_kvm(mmu);
+	pgt->pkvm.mappings	= RB_ROOT;
+	rwlock_init(&pgt->pkvm.mappings_lock);
+
+	return 0;
+}
+
+void pkvm_pgtable_destroy(struct kvm_pgtable *pgt)
+{
+	pkvm_handle_t handle = pkvm_pgt_to_handle(pgt);
+	struct pkvm_mapping *mapping;
+	struct rb_node *node;
+
+	if (!handle)
+		return;
+
+	node = rb_first(&pgt->pkvm.mappings);
+	while (node) {
+		mapping = rb_entry(node, struct pkvm_mapping, node);
+		kvm_call_hyp_nvhe(__pkvm_host_unshare_guest, handle, mapping->gfn);
+		node = rb_next(node);
+		rb_erase(&mapping->node, &pgt->pkvm.mappings);
+		kfree(mapping);
+	}
+}
+
+int pkvm_pgtable_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
+			   u64 phys, enum kvm_pgtable_prot prot,
+			   void *mc, enum kvm_pgtable_walk_flags flags)
+{
+	struct pkvm_mapping *mapping = NULL;
+	struct kvm_hyp_memcache *cache = mc;
+	u64 gfn = addr >> PAGE_SHIFT;
+	u64 pfn = phys >> PAGE_SHIFT;
+	int ret;
+
+	if (size != PAGE_SIZE)
+		return -EINVAL;
+
+	write_lock(&pgt->pkvm.mappings_lock);
+	ret = kvm_call_hyp_nvhe(__pkvm_host_share_guest, pfn, gfn, prot);
+	if (ret) {
+		/* Is the gfn already mapped due to a racing vCPU? */
+		if (ret == -EPERM)
+			ret = -EAGAIN;
+		goto unlock;
+	}
+
+	swap(mapping, cache->mapping);
+	mapping->gfn = gfn;
+	mapping->pfn = pfn;
+	WARN_ON(rb_find_add(&mapping->node, &pgt->pkvm.mappings, cmp_mappings));
+unlock:
+	write_unlock(&pgt->pkvm.mappings_lock);
+
+	return ret;
+}
+
+int pkvm_pgtable_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
+{
+	pkvm_handle_t handle = pkvm_pgt_to_handle(pgt);
+	struct pkvm_mapping *mapping;
+	struct rb_node *tmp;
+	int ret = 0;
+
+	write_lock(&pgt->pkvm.mappings_lock);
+	for_each_mapping_in_range(pgt, addr, addr + size, mapping, tmp) {
+		ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_guest, handle, mapping->gfn);
+		if (WARN_ON(ret))
+			break;
+
+		rb_erase(&mapping->node, &pgt->pkvm.mappings);
+		kfree(mapping);
+	}
+	write_unlock(&pgt->pkvm.mappings_lock);
+
+	return ret;
+}
+
+int pkvm_pgtable_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
+{
+	pkvm_handle_t handle = pkvm_pgt_to_handle(pgt);
+	struct pkvm_mapping *mapping;
+	struct rb_node *tmp;
+	int ret = 0;
+
+	read_lock(&pgt->pkvm.mappings_lock);
+	for_each_mapping_in_range(pgt, addr, addr + size, mapping, tmp) {
+		ret = kvm_call_hyp_nvhe(__pkvm_host_wrprotect_guest, handle, mapping->gfn);
+		if (WARN_ON(ret))
+			break;
+	}
+	read_unlock(&pgt->pkvm.mappings_lock);
+
+	return ret;
+}
+
+int pkvm_pgtable_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
+{
+	struct pkvm_mapping *mapping;
+	struct rb_node *tmp;
+
+	read_lock(&pgt->pkvm.mappings_lock);
+	for_each_mapping_in_range(pgt, addr, addr + size, mapping, tmp)
+		__clean_dcache_guest_page(pfn_to_kaddr(mapping->pfn), PAGE_SIZE);
+	read_unlock(&pgt->pkvm.mappings_lock);
+
+	return 0;
+}
+
+bool pkvm_pgtable_test_clear_young(struct kvm_pgtable *pgt, u64 addr, u64 size, bool mkold)
+{
+	pkvm_handle_t handle = pkvm_pgt_to_handle(pgt);
+	struct pkvm_mapping *mapping;
+	struct rb_node *tmp;
+	bool young = false;
+
+	read_lock(&pgt->pkvm.mappings_lock);
+	for_each_mapping_in_range(pgt, addr, addr + size, mapping, tmp)
+		young |= kvm_call_hyp_nvhe(__pkvm_host_test_clear_young_guest, handle, mapping->gfn,
+					   mkold);
+	read_unlock(&pgt->pkvm.mappings_lock);
+
+	return young;
+}
+
+int pkvm_pgtable_relax_perms(struct kvm_pgtable *pgt, u64 addr, enum kvm_pgtable_prot prot,
+			     enum kvm_pgtable_walk_flags flags)
+{
+	return kvm_call_hyp_nvhe(__pkvm_host_relax_guest_perms, addr >> PAGE_SHIFT, prot);
+}
+
+void pkvm_pgtable_mkyoung(struct kvm_pgtable *pgt, u64 addr, enum kvm_pgtable_walk_flags flags)
+{
+	WARN_ON(kvm_call_hyp_nvhe(__pkvm_host_mkyoung_guest, addr >> PAGE_SHIFT));
+}
+
+void pkvm_pgtable_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, s8 level)
+{
+	WARN_ON(1);
+}
+
+kvm_pte_t *pkvm_pgtable_create_unlinked(struct kvm_pgtable *pgt, u64 phys, s8 level,
+					enum kvm_pgtable_prot prot, void *mc, bool force_pte)
+{
+	WARN_ON(1);
+	return NULL;
+}
+
+int pkvm_pgtable_split(struct kvm_pgtable *pgt, u64 addr, u64 size, struct kvm_mmu_memory_cache *mc)
+{
+	WARN_ON(1);
+	return -EINVAL;
+}
-- 
2.47.0.338.g60cca15819-goog



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 18/18] KVM: arm64: Plumb the pKVM MMU in KVM
  2024-12-03 10:37 [PATCH v2 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
                   ` (16 preceding siblings ...)
  2024-12-03 10:37 ` [PATCH v2 17/18] KVM: arm64: Introduce the EL1 pKVM MMU Quentin Perret
@ 2024-12-03 10:37 ` Quentin Perret
  17 siblings, 0 replies; 50+ messages in thread
From: Quentin Perret @ 2024-12-03 10:37 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon
  Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
	kvmarm, linux-kernel

Introduce the KVM_PGT_S2() helper macro to allow switching from the
traditional pgtable code to the pKVM version easily in mmu.c. The cost
of this 'indirection' is expected to be very minimal due to
is_protected_kvm_enabled() being backed by a static key.

With this, everything is in place to allow the delegation of
non-protected guest stage-2 page-tables to pKVM, so let's stop using the
host's kvm_s2_mmu from EL2 and enjoy the ride.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/arm.c               |   9 ++-
 arch/arm64/kvm/hyp/nvhe/hyp-main.c |   2 -
 arch/arm64/kvm/mmu.c               | 103 +++++++++++++++++++++--------
 3 files changed, 83 insertions(+), 31 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 55cc62b2f469..9bcbc7b8ed38 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -502,7 +502,10 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
 
 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 {
-	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
+	if (!is_protected_kvm_enabled())
+		kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
+	else
+		free_hyp_memcache(&vcpu->arch.pkvm_memcache);
 	kvm_timer_vcpu_terminate(vcpu);
 	kvm_pmu_vcpu_destroy(vcpu);
 	kvm_vgic_vcpu_destroy(vcpu);
@@ -574,6 +577,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	struct kvm_s2_mmu *mmu;
 	int *last_ran;
 
+	if (is_protected_kvm_enabled())
+		goto nommu;
+
 	if (vcpu_has_nv(vcpu))
 		kvm_vcpu_load_hw_mmu(vcpu);
 
@@ -594,6 +600,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 		*last_ran = vcpu->vcpu_idx;
 	}
 
+nommu:
 	vcpu->cpu = cpu;
 
 	kvm_vgic_load(vcpu);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 219d7fb850ec..64c7dc595218 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -103,8 +103,6 @@ static void flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
 	/* Limit guest vector length to the maximum supported by the host.  */
 	hyp_vcpu->vcpu.arch.sve_max_vl	= min(host_vcpu->arch.sve_max_vl, kvm_host_sve_max_vl);
 
-	hyp_vcpu->vcpu.arch.hw_mmu	= host_vcpu->arch.hw_mmu;
-
 	hyp_vcpu->vcpu.arch.mdcr_el2	= host_vcpu->arch.mdcr_el2;
 	hyp_vcpu->vcpu.arch.hcr_el2 &= ~(HCR_TWI | HCR_TWE);
 	hyp_vcpu->vcpu.arch.hcr_el2 |= READ_ONCE(host_vcpu->arch.hcr_el2) &
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 641e4fec1659..058bc2c8f3c6 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -15,6 +15,7 @@
 #include <asm/kvm_arm.h>
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_pgtable.h>
+#include <asm/kvm_pkvm.h>
 #include <asm/kvm_ras.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
@@ -31,6 +32,14 @@ static phys_addr_t __ro_after_init hyp_idmap_vector;
 
 static unsigned long __ro_after_init io_map_base;
 
+#define KVM_PGT_S2(fn, ...)								\
+	({										\
+		typeof(kvm_pgtable_stage2_ ## fn) *__fn = kvm_pgtable_stage2_ ## fn;	\
+		if (is_protected_kvm_enabled())						\
+			__fn = pkvm_pgtable_ ## fn;					\
+		__fn(__VA_ARGS__);							\
+	})
+
 static phys_addr_t __stage2_range_addr_end(phys_addr_t addr, phys_addr_t end,
 					   phys_addr_t size)
 {
@@ -147,7 +156,7 @@ static int kvm_mmu_split_huge_pages(struct kvm *kvm, phys_addr_t addr,
 			return -EINVAL;
 
 		next = __stage2_range_addr_end(addr, end, chunk_size);
-		ret = kvm_pgtable_stage2_split(pgt, addr, next - addr, cache);
+		ret = KVM_PGT_S2(split, pgt, addr, next - addr, cache);
 		if (ret)
 			break;
 	} while (addr = next, addr != end);
@@ -168,15 +177,23 @@ static bool memslot_is_logging(struct kvm_memory_slot *memslot)
  */
 int kvm_arch_flush_remote_tlbs(struct kvm *kvm)
 {
-	kvm_call_hyp(__kvm_tlb_flush_vmid, &kvm->arch.mmu);
+	if (is_protected_kvm_enabled())
+		kvm_call_hyp_nvhe(__pkvm_tlb_flush_vmid, kvm->arch.pkvm.handle);
+	else
+		kvm_call_hyp(__kvm_tlb_flush_vmid, &kvm->arch.mmu);
 	return 0;
 }
 
 int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm,
 				      gfn_t gfn, u64 nr_pages)
 {
-	kvm_tlb_flush_vmid_range(&kvm->arch.mmu,
-				gfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT);
+	u64 size = nr_pages << PAGE_SHIFT;
+	u64 addr = gfn << PAGE_SHIFT;
+
+	if (is_protected_kvm_enabled())
+		kvm_call_hyp_nvhe(__pkvm_tlb_flush_vmid, kvm->arch.pkvm.handle);
+	else
+		kvm_tlb_flush_vmid_range(&kvm->arch.mmu, addr, size);
 	return 0;
 }
 
@@ -225,7 +242,7 @@ static void stage2_free_unlinked_table_rcu_cb(struct rcu_head *head)
 	void *pgtable = page_to_virt(page);
 	s8 level = page_private(page);
 
-	kvm_pgtable_stage2_free_unlinked(&kvm_s2_mm_ops, pgtable, level);
+	KVM_PGT_S2(free_unlinked, &kvm_s2_mm_ops, pgtable, level);
 }
 
 static void stage2_free_unlinked_table(void *addr, s8 level)
@@ -280,6 +297,11 @@ static void invalidate_icache_guest_page(void *va, size_t size)
 	__invalidate_icache_guest_page(va, size);
 }
 
+static int kvm_s2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
+{
+	return KVM_PGT_S2(unmap, pgt, addr, size);
+}
+
 /*
  * Unmapping vs dcache management:
  *
@@ -324,8 +346,7 @@ static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64
 
 	lockdep_assert_held_write(&kvm->mmu_lock);
 	WARN_ON(size & ~PAGE_MASK);
-	WARN_ON(stage2_apply_range(mmu, start, end, kvm_pgtable_stage2_unmap,
-				   may_block));
+	WARN_ON(stage2_apply_range(mmu, start, end, kvm_s2_unmap, may_block));
 }
 
 void kvm_stage2_unmap_range(struct kvm_s2_mmu *mmu, phys_addr_t start,
@@ -334,9 +355,14 @@ void kvm_stage2_unmap_range(struct kvm_s2_mmu *mmu, phys_addr_t start,
 	__unmap_stage2_range(mmu, start, size, may_block);
 }
 
+static int kvm_s2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
+{
+	return KVM_PGT_S2(flush, pgt, addr, size);
+}
+
 void kvm_stage2_flush_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
 {
-	stage2_apply_range_resched(mmu, addr, end, kvm_pgtable_stage2_flush);
+	stage2_apply_range_resched(mmu, addr, end, kvm_s2_flush);
 }
 
 static void stage2_flush_memslot(struct kvm *kvm,
@@ -942,10 +968,14 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
 		return -ENOMEM;
 
 	mmu->arch = &kvm->arch;
-	err = kvm_pgtable_stage2_init(pgt, mmu, &kvm_s2_mm_ops);
+	err = KVM_PGT_S2(init, pgt, mmu, &kvm_s2_mm_ops);
 	if (err)
 		goto out_free_pgtable;
 
+	mmu->pgt = pgt;
+	if (is_protected_kvm_enabled())
+		return 0;
+
 	mmu->last_vcpu_ran = alloc_percpu(typeof(*mmu->last_vcpu_ran));
 	if (!mmu->last_vcpu_ran) {
 		err = -ENOMEM;
@@ -959,7 +989,6 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
 	mmu->split_page_chunk_size = KVM_ARM_EAGER_SPLIT_CHUNK_SIZE_DEFAULT;
 	mmu->split_page_cache.gfp_zero = __GFP_ZERO;
 
-	mmu->pgt = pgt;
 	mmu->pgd_phys = __pa(pgt->pgd);
 
 	if (kvm_is_nested_s2_mmu(kvm, mmu))
@@ -968,7 +997,7 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
 	return 0;
 
 out_destroy_pgtable:
-	kvm_pgtable_stage2_destroy(pgt);
+	KVM_PGT_S2(destroy, pgt);
 out_free_pgtable:
 	kfree(pgt);
 	return err;
@@ -1065,7 +1094,7 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
 	write_unlock(&kvm->mmu_lock);
 
 	if (pgt) {
-		kvm_pgtable_stage2_destroy(pgt);
+		KVM_PGT_S2(destroy, pgt);
 		kfree(pgt);
 	}
 }
@@ -1082,9 +1111,11 @@ static void *hyp_mc_alloc_fn(void *unused)
 
 void free_hyp_memcache(struct kvm_hyp_memcache *mc)
 {
-	if (is_protected_kvm_enabled())
-		__free_hyp_memcache(mc, hyp_mc_free_fn,
-				    kvm_host_va, NULL);
+	if (!is_protected_kvm_enabled())
+		return;
+
+	kfree(mc->mapping);
+	__free_hyp_memcache(mc, hyp_mc_free_fn, kvm_host_va, NULL);
 }
 
 int topup_hyp_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages)
@@ -1092,6 +1123,12 @@ int topup_hyp_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages)
 	if (!is_protected_kvm_enabled())
 		return 0;
 
+	if (!mc->mapping) {
+		mc->mapping = kzalloc(sizeof(struct pkvm_mapping), GFP_KERNEL_ACCOUNT);
+		if (!mc->mapping)
+			return -ENOMEM;
+	}
+
 	return __topup_hyp_memcache(mc, min_pages, hyp_mc_alloc_fn,
 				    kvm_host_pa, NULL);
 }
@@ -1130,8 +1167,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 			break;
 
 		write_lock(&kvm->mmu_lock);
-		ret = kvm_pgtable_stage2_map(pgt, addr, PAGE_SIZE, pa, prot,
-					     &cache, 0);
+		ret = KVM_PGT_S2(map, pgt, addr, PAGE_SIZE, pa, prot, &cache, 0);
 		write_unlock(&kvm->mmu_lock);
 		if (ret)
 			break;
@@ -1143,6 +1179,10 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 	return ret;
 }
 
+static int kvm_s2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
+{
+	return KVM_PGT_S2(wrprotect, pgt, addr, size);
+}
 /**
  * kvm_stage2_wp_range() - write protect stage2 memory region range
  * @mmu:        The KVM stage-2 MMU pointer
@@ -1151,7 +1191,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
  */
 void kvm_stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
 {
-	stage2_apply_range_resched(mmu, addr, end, kvm_pgtable_stage2_wrprotect);
+	stage2_apply_range_resched(mmu, addr, end, kvm_s2_wrprotect);
 }
 
 /**
@@ -1442,9 +1482,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	unsigned long mmu_seq;
 	phys_addr_t ipa = fault_ipa;
 	struct kvm *kvm = vcpu->kvm;
-	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
 	struct vm_area_struct *vma;
 	short vma_shift;
+	void *memcache;
 	gfn_t gfn;
 	kvm_pfn_t pfn;
 	bool logging_active = memslot_is_logging(memslot);
@@ -1472,8 +1512,15 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * and a write fault needs to collapse a block entry into a table.
 	 */
 	if (!fault_is_perm || (logging_active && write_fault)) {
-		ret = kvm_mmu_topup_memory_cache(memcache,
-						 kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu));
+		int min_pages = kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu);
+
+		if (!is_protected_kvm_enabled()) {
+			memcache = &vcpu->arch.mmu_page_cache;
+			ret = kvm_mmu_topup_memory_cache(memcache, min_pages);
+		} else {
+			memcache = &vcpu->arch.pkvm_memcache;
+			ret = topup_hyp_memcache(memcache, min_pages);
+		}
 		if (ret)
 			return ret;
 	}
@@ -1494,7 +1541,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * logging_active is guaranteed to never be true for VM_PFNMAP
 	 * memslots.
 	 */
-	if (logging_active) {
+	if (logging_active || is_protected_kvm_enabled()) {
 		force_pte = true;
 		vma_shift = PAGE_SHIFT;
 	} else {
@@ -1696,9 +1743,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		 * PTE, which will be preserved.
 		 */
 		prot &= ~KVM_NV_GUEST_MAP_SZ;
-		ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot, flags);
+		ret = KVM_PGT_S2(relax_perms, pgt, fault_ipa, prot, flags);
 	} else {
-		ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
+		ret = KVM_PGT_S2(map, pgt, fault_ipa, vma_pagesize,
 					     __pfn_to_phys(pfn), prot,
 					     memcache, flags);
 	}
@@ -1724,7 +1771,7 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
 
 	read_lock(&vcpu->kvm->mmu_lock);
 	mmu = vcpu->arch.hw_mmu;
-	kvm_pgtable_stage2_mkyoung(mmu->pgt, fault_ipa, flags);
+	KVM_PGT_S2(mkyoung, mmu->pgt, fault_ipa, flags);
 	read_unlock(&vcpu->kvm->mmu_lock);
 }
 
@@ -1764,7 +1811,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 		}
 
 		/* Falls between the IPA range and the PARange? */
-		if (fault_ipa >= BIT_ULL(vcpu->arch.hw_mmu->pgt->ia_bits)) {
+		if (fault_ipa >= BIT_ULL(VTCR_EL2_IPA(vcpu->arch.hw_mmu->vtcr))) {
 			fault_ipa |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0);
 
 			if (is_iabt)
@@ -1930,7 +1977,7 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 	if (!kvm->arch.mmu.pgt)
 		return false;
 
-	return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
+	return KVM_PGT_S2(test_clear_young, kvm->arch.mmu.pgt,
 						   range->start << PAGE_SHIFT,
 						   size, true);
 	/*
@@ -1946,7 +1993,7 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 	if (!kvm->arch.mmu.pgt)
 		return false;
 
-	return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
+	return KVM_PGT_S2(test_clear_young, kvm->arch.mmu.pgt,
 						   range->start << PAGE_SHIFT,
 						   size, false);
 }
-- 
2.47.0.338.g60cca15819-goog



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 01/18] KVM: arm64: Change the layout of enum pkvm_page_state
  2024-12-03 10:37 ` [PATCH v2 01/18] KVM: arm64: Change the layout of enum pkvm_page_state Quentin Perret
@ 2024-12-10 12:59   ` Fuad Tabba
  2024-12-10 15:15     ` Quentin Perret
  0 siblings, 1 reply; 50+ messages in thread
From: Fuad Tabba @ 2024-12-10 12:59 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Vincent Donnefort,
	Sebastian Ene, linux-arm-kernel, kvmarm, linux-kernel

Hi Quentin,

On Tue, 3 Dec 2024 at 10:37, Quentin Perret <qperret@google.com> wrote:
>
> The 'concrete' (a.k.a non-meta) page states are currently encoded using
> software bits in PTEs. For performance reasons, the abstract
> pkvm_page_state enum uses the same bits to encode these states as that
> makes conversions from and to PTEs easy.
>
> In order to prepare the ground for moving the 'concrete' state storage
> to the hyp vmemmap, re-arrange the enum to use bits 0 and 1 for this
> purpose.
>
> No functional changes intended.
>
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 17 ++++++++++-------
>  1 file changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> index 0972faccc2af..ca3177481b78 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -24,25 +24,28 @@
>   */
>  enum pkvm_page_state {
>         PKVM_PAGE_OWNED                 = 0ULL,
> -       PKVM_PAGE_SHARED_OWNED          = KVM_PGTABLE_PROT_SW0,
> -       PKVM_PAGE_SHARED_BORROWED       = KVM_PGTABLE_PROT_SW1,
> -       __PKVM_PAGE_RESERVED            = KVM_PGTABLE_PROT_SW0 |
> -                                         KVM_PGTABLE_PROT_SW1,
> +       PKVM_PAGE_SHARED_OWNED          = BIT(0),
> +       PKVM_PAGE_SHARED_BORROWED       = BIT(1),
> +       __PKVM_PAGE_RESERVED            = BIT(0) | BIT(1),
>
>         /* Meta-states which aren't encoded directly in the PTE's SW bits */
> -       PKVM_NOPAGE,
> +       PKVM_NOPAGE                     = BIT(2),
>  };
> +#define PKVM_PAGE_META_STATES_MASK     (~(BIT(0) | BIT(1)))
>
>  #define PKVM_PAGE_STATE_PROT_MASK      (KVM_PGTABLE_PROT_SW0 | KVM_PGTABLE_PROT_SW1)
>  static inline enum kvm_pgtable_prot pkvm_mkstate(enum kvm_pgtable_prot prot,
>                                                  enum pkvm_page_state state)
>  {
> -       return (prot & ~PKVM_PAGE_STATE_PROT_MASK) | state;
> +       BUG_ON(state & PKVM_PAGE_META_STATES_MASK);

This is a slight change in functionality, having a BUG_ON instead of
just masking out illegal states. Is it necessary?

Cheers,
/fuad


> +       prot &= ~PKVM_PAGE_STATE_PROT_MASK;
> +       prot |= FIELD_PREP(PKVM_PAGE_STATE_PROT_MASK, state);
> +       return prot;
>  }
>
>  static inline enum pkvm_page_state pkvm_getstate(enum kvm_pgtable_prot prot)
>  {
> -       return prot & PKVM_PAGE_STATE_PROT_MASK;
> +       return FIELD_GET(PKVM_PAGE_STATE_PROT_MASK, prot);
>  }
>
>  struct host_mmu {
> --
> 2.47.0.338.g60cca15819-goog
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 04/18] KVM: arm64: Move host page ownership tracking to the hyp vmemmap
  2024-12-03 10:37 ` [PATCH v2 04/18] KVM: arm64: Move host page ownership tracking to the hyp vmemmap Quentin Perret
@ 2024-12-10 13:02   ` Fuad Tabba
  2024-12-10 15:29     ` Quentin Perret
  0 siblings, 1 reply; 50+ messages in thread
From: Fuad Tabba @ 2024-12-10 13:02 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Vincent Donnefort,
	Sebastian Ene, linux-arm-kernel, kvmarm, linux-kernel

Hi Quentin,

On Tue, 3 Dec 2024 at 10:37, Quentin Perret <qperret@google.com> wrote:
>
> We currently store part of the page-tracking state in PTE software bits
> for the host, guests and the hypervisor. This is sub-optimal when e.g.
> sharing pages as this forces to break block mappings purely to support
> this software tracking. This causes an unnecessarily fragmented stage-2
> page-table for the host in particular when it shares pages with Secure,
> which can lead to measurable regressions. Moreover, having this state
> stored in the page-table forces us to do multiple costly walks on the
> page transition path, hence causing overhead.
>
> In order to work around these problems, move the host-side page-tracking
> logic from SW bits in its stage-2 PTEs to the hypervisor's vmemmap.
>
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/hyp/include/nvhe/memory.h |  6 +-
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c    | 94 ++++++++++++++++--------
>  arch/arm64/kvm/hyp/nvhe/setup.c          |  7 +-
>  3 files changed, 71 insertions(+), 36 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
> index 88cb8ff9e769..08f3a0416d4c 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
> @@ -8,7 +8,7 @@
>  #include <linux/types.h>
>
>  /*
> - * SW bits 0-1 are reserved to track the memory ownership state of each page:
> + * Bits 0-1 are reserved to track the memory ownership state of each page:
>   *   00: The page is owned exclusively by the page-table owner.
>   *   01: The page is owned by the page-table owner, but is shared
>   *       with another entity.

Not shown in this patch, but a couple of lines below, you might want
to update the comment on PKVM_NOPAGE to fix the reference to "PTE's SW
bits":

> /* Meta-states which aren't encoded directly in the PTE's SW bits */
> PKVM_NOPAGE = BIT(2),

> @@ -44,7 +44,9 @@ static inline enum pkvm_page_state pkvm_getstate(enum kvm_pgtable_prot prot)
>  struct hyp_page {
>         u16 refcount;
>         u8 order;
> -       u8 reserved;
> +
> +       /* Host (non-meta) state. Guarded by the host stage-2 lock. */
> +       enum pkvm_page_state host_state : 8;
>  };
>
>  extern u64 __hyp_vmemmap;
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index caba3e4bd09e..1595081c4f6b 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -201,8 +201,8 @@ static void *guest_s2_zalloc_page(void *mc)
>
>         memset(addr, 0, PAGE_SIZE);
>         p = hyp_virt_to_page(addr);
> -       memset(p, 0, sizeof(*p));
>         p->refcount = 1;
> +       p->order = 0;
>
>         return addr;
>  }
> @@ -268,6 +268,7 @@ int kvm_guest_prepare_stage2(struct pkvm_hyp_vm *vm, void *pgd)
>
>  void reclaim_guest_pages(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc)
>  {
> +       struct hyp_page *page;
>         void *addr;
>
>         /* Dump all pgtable pages in the hyp_pool */
> @@ -279,7 +280,9 @@ void reclaim_guest_pages(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc)
>         /* Drain the hyp_pool into the memcache */
>         addr = hyp_alloc_pages(&vm->pool, 0);
>         while (addr) {
> -               memset(hyp_virt_to_page(addr), 0, sizeof(struct hyp_page));
> +               page = hyp_virt_to_page(addr);
> +               page->refcount = 0;
> +               page->order = 0;
>                 push_hyp_memcache(mc, addr, hyp_virt_to_phys);
>                 WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(addr), 1));
>                 addr = hyp_alloc_pages(&vm->pool, 0);
> @@ -382,19 +385,25 @@ bool addr_is_memory(phys_addr_t phys)
>         return !!find_mem_range(phys, &range);
>  }
>
> -static bool addr_is_allowed_memory(phys_addr_t phys)
> +static bool is_in_mem_range(u64 addr, struct kvm_mem_range *range)
> +{
> +       return range->start <= addr && addr < range->end;
> +}
> +
> +static int range_is_allowed_memory(u64 start, u64 end)

The name of this function "range_*is*_..." implies that it returns a
boolean, which other functions in this file (and patch) with similar
names do, but it returns an error instead. Maybe
check_range_allowed_memory)?

>  {
>         struct memblock_region *reg;
>         struct kvm_mem_range range;
>
> -       reg = find_mem_range(phys, &range);
> +       /* Can't check the state of both MMIO and memory regions at once */

I don't understand this comment in relation to the code. Could you
explain it to me please?

> +       reg = find_mem_range(start, &range);
> +       if (!is_in_mem_range(end - 1, &range))
> +               return -EINVAL;
>
> -       return reg && !(reg->flags & MEMBLOCK_NOMAP);
> -}
> +       if (!reg || reg->flags & MEMBLOCK_NOMAP)
> +               return -EPERM;
>
> -static bool is_in_mem_range(u64 addr, struct kvm_mem_range *range)
> -{
> -       return range->start <= addr && addr < range->end;
> +       return 0;
>  }
>
>  static bool range_is_memory(u64 start, u64 end)
> @@ -454,8 +463,11 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
>         if (kvm_pte_valid(pte))
>                 return -EAGAIN;
>
> -       if (pte)
> +       if (pte) {
> +               WARN_ON(addr_is_memory(addr) &&
> +                       !(hyp_phys_to_page(addr)->host_state & PKVM_NOPAGE));

nit: since the host state is now an enum, should this just be an
equality check rather than an &? This makes it consistent with other
checks of pkvm_page_state in this patch too.

>                 return -EPERM;
> +       }
>
>         do {
>                 u64 granule = kvm_granule_size(level);
> @@ -477,10 +489,29 @@ int host_stage2_idmap_locked(phys_addr_t addr, u64 size,
>         return host_stage2_try(__host_stage2_idmap, addr, addr + size, prot);
>  }
>
> +static void __host_update_page_state(phys_addr_t addr, u64 size, enum pkvm_page_state state)
> +{
> +       phys_addr_t end = addr + size;

nit: newline

> +       for (; addr < end; addr += PAGE_SIZE)
> +               hyp_phys_to_page(addr)->host_state = state;
> +}
> +
>  int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
>  {
> -       return host_stage2_try(kvm_pgtable_stage2_set_owner, &host_mmu.pgt,
> -                              addr, size, &host_s2_pool, owner_id);
> +       int ret;
> +
> +       ret = host_stage2_try(kvm_pgtable_stage2_set_owner, &host_mmu.pgt,
> +                             addr, size, &host_s2_pool, owner_id);
> +       if (ret || !addr_is_memory(addr))
> +               return ret;

Can hyp set an owner for an address that isn't memory? Trying to
understand why we need to update the host stage2 pagetable but not the
hypervisor's vmemmap in that case.

> +
> +       /* Don't forget to update the vmemmap tracking for the host */
> +       if (owner_id == PKVM_ID_HOST)
> +               __host_update_page_state(addr, size, PKVM_PAGE_OWNED);
> +       else
> +               __host_update_page_state(addr, size, PKVM_NOPAGE);
> +
> +       return 0;
>  }
>
>  static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot prot)
> @@ -604,35 +635,38 @@ static int check_page_state_range(struct kvm_pgtable *pgt, u64 addr, u64 size,
>         return kvm_pgtable_walk(pgt, addr, size, &walker);
>  }
>
> -static enum pkvm_page_state host_get_page_state(kvm_pte_t pte, u64 addr)
> -{
> -       if (!addr_is_allowed_memory(addr))
> -               return PKVM_NOPAGE;
> -
> -       if (!kvm_pte_valid(pte) && pte)
> -               return PKVM_NOPAGE;
> -
> -       return pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte));
> -}
> -
>  static int __host_check_page_state_range(u64 addr, u64 size,
>                                          enum pkvm_page_state state)
>  {
> -       struct check_walk_data d = {
> -               .desired        = state,
> -               .get_page_state = host_get_page_state,
> -       };
> +       u64 end = addr + size;
> +       int ret;
> +
> +       ret = range_is_allowed_memory(addr, end);
> +       if (ret)
> +               return ret;
>
>         hyp_assert_lock_held(&host_mmu.lock);
> -       return check_page_state_range(&host_mmu.pgt, addr, size, &d);
> +       for (; addr < end; addr += PAGE_SIZE) {
> +               if (hyp_phys_to_page(addr)->host_state != state)
> +                       return -EPERM;
> +       }
> +
> +       return 0;
>  }
>
>  static int __host_set_page_state_range(u64 addr, u64 size,
>                                        enum pkvm_page_state state)
>  {
> -       enum kvm_pgtable_prot prot = pkvm_mkstate(PKVM_HOST_MEM_PROT, state);
> +       if (hyp_phys_to_page(addr)->host_state & PKVM_NOPAGE) {

Same nit as above regarding checking for PKVM_NOPAGE

Cheers,
/fuad


> +               int ret = host_stage2_idmap_locked(addr, size, PKVM_HOST_MEM_PROT);
>
> -       return host_stage2_idmap_locked(addr, size, prot);
> +               if (ret)
> +                       return ret;
> +       }
> +
> +       __host_update_page_state(addr, size, state);
> +
> +       return 0;
>  }
>
>  static int host_request_owned_transition(u64 *completer_addr,
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index cbdd18cd3f98..7e04d1c2a03d 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -180,7 +180,6 @@ static void hpool_put_page(void *addr)
>  static int fix_host_ownership_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                                      enum kvm_pgtable_walk_flags visit)
>  {
> -       enum kvm_pgtable_prot prot;
>         enum pkvm_page_state state;
>         phys_addr_t phys;
>
> @@ -203,16 +202,16 @@ static int fix_host_ownership_walker(const struct kvm_pgtable_visit_ctx *ctx,
>         case PKVM_PAGE_OWNED:
>                 return host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_HYP);
>         case PKVM_PAGE_SHARED_OWNED:
> -               prot = pkvm_mkstate(PKVM_HOST_MEM_PROT, PKVM_PAGE_SHARED_BORROWED);
> +               hyp_phys_to_page(phys)->host_state = PKVM_PAGE_SHARED_BORROWED;
>                 break;
>         case PKVM_PAGE_SHARED_BORROWED:
> -               prot = pkvm_mkstate(PKVM_HOST_MEM_PROT, PKVM_PAGE_SHARED_OWNED);
> +               hyp_phys_to_page(phys)->host_state = PKVM_PAGE_SHARED_OWNED;
>                 break;
>         default:
>                 return -EINVAL;
>         }
>
> -       return host_stage2_idmap_locked(phys, PAGE_SIZE, prot);
> +       return 0;
>  }
>
>  static int fix_hyp_pgtable_refcnt_walker(const struct kvm_pgtable_visit_ctx *ctx,
> --
> 2.47.0.338.g60cca15819-goog
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 10/18] KVM: arm64: Introduce __pkvm_host_share_guest()
  2024-12-03 10:37 ` [PATCH v2 10/18] KVM: arm64: Introduce __pkvm_host_share_guest() Quentin Perret
@ 2024-12-10 13:58   ` Fuad Tabba
  2024-12-10 15:41     ` Quentin Perret
  0 siblings, 1 reply; 50+ messages in thread
From: Fuad Tabba @ 2024-12-10 13:58 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Vincent Donnefort,
	Sebastian Ene, linux-arm-kernel, kvmarm, linux-kernel

Hi Quentin,

On Tue, 3 Dec 2024 at 10:37, Quentin Perret <qperret@google.com> wrote:
>
> In preparation for handling guest stage-2 mappings at EL2, introduce a
> new pKVM hypercall allowing to share pages with non-protected guests.
>
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_asm.h              |  1 +
>  arch/arm64/include/asm/kvm_host.h             |  3 +
>  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  1 +
>  arch/arm64/kvm/hyp/include/nvhe/memory.h      |  2 +
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c            | 34 +++++++++
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 70 +++++++++++++++++++
>  arch/arm64/kvm/hyp/nvhe/pkvm.c                |  7 ++
>  7 files changed, 118 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 89c0fac69551..449337f5b2a3 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -65,6 +65,7 @@ enum __kvm_host_smccc_func {
>         /* Hypercalls available after pKVM finalisation */
>         __KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp,
>         __KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
> +       __KVM_HOST_SMCCC_FUNC___pkvm_host_share_guest,
>         __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
>         __KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
>         __KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index e18e9244d17a..f75988e3515b 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -771,6 +771,9 @@ struct kvm_vcpu_arch {
>         /* Cache some mmu pages needed inside spinlock regions */
>         struct kvm_mmu_memory_cache mmu_page_cache;
>
> +       /* Pages to be donated to pkvm/EL2 if it runs out */

Runs out of what? :) I'm being facetious, it's just that the comment
is a bit unclear.

> +       struct kvm_hyp_memcache pkvm_memcache;
> +
>         /* Virtual SError ESR to restore when HCR_EL2.VSE is set */
>         u64 vsesr_el2;
>
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> index 25038ac705d8..a7976e50f556 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -39,6 +39,7 @@ int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages);
>  int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
>  int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
>  int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
> +int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum kvm_pgtable_prot prot);
>
>  bool addr_is_memory(phys_addr_t phys);
>  int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
> index 08f3a0416d4c..457318215155 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
> @@ -47,6 +47,8 @@ struct hyp_page {
>
>         /* Host (non-meta) state. Guarded by the host stage-2 lock. */
>         enum pkvm_page_state host_state : 8;
> +
> +       u32 host_share_guest_count;
>  };
>
>  extern u64 __hyp_vmemmap;
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index 95d78db315b3..d659462fbf5d 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -211,6 +211,39 @@ static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
>         cpu_reg(host_ctxt, 1) =  ret;
>  }
>
> +static int pkvm_refill_memcache(struct pkvm_hyp_vcpu *hyp_vcpu)
> +{
> +       struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
> +
> +       return refill_memcache(&hyp_vcpu->vcpu.arch.pkvm_memcache,
> +                              host_vcpu->arch.pkvm_memcache.nr_pages,
> +                              &host_vcpu->arch.pkvm_memcache);
> +}
> +
> +static void handle___pkvm_host_share_guest(struct kvm_cpu_context *host_ctxt)
> +{
> +       DECLARE_REG(u64, pfn, host_ctxt, 1);
> +       DECLARE_REG(u64, gfn, host_ctxt, 2);
> +       DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 3);
> +       struct pkvm_hyp_vcpu *hyp_vcpu;
> +       int ret = -EINVAL;
> +
> +       if (!is_protected_kvm_enabled())
> +               goto out;
> +
> +       hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
> +       if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
> +               goto out;
> +
> +       ret = pkvm_refill_memcache(hyp_vcpu);
> +       if (ret)
> +               goto out;
> +
> +       ret = __pkvm_host_share_guest(pfn, gfn, hyp_vcpu, prot);
> +out:
> +       cpu_reg(host_ctxt, 1) =  ret;
> +}
> +
>  static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt)
>  {
>         DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
> @@ -420,6 +453,7 @@ static const hcall_t host_hcall[] = {
>
>         HANDLE_FUNC(__pkvm_host_share_hyp),
>         HANDLE_FUNC(__pkvm_host_unshare_hyp),
> +       HANDLE_FUNC(__pkvm_host_share_guest),
>         HANDLE_FUNC(__kvm_adjust_pc),
>         HANDLE_FUNC(__kvm_vcpu_run),
>         HANDLE_FUNC(__kvm_flush_vm_context),
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 1595081c4f6b..a69d7212b64c 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -861,6 +861,27 @@ static int hyp_complete_donation(u64 addr,
>         return pkvm_create_mappings_locked(start, end, prot);
>  }
>
> +static enum pkvm_page_state guest_get_page_state(kvm_pte_t pte, u64 addr)
> +{
> +       if (!kvm_pte_valid(pte))
> +               return PKVM_NOPAGE;
> +
> +       return pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte));
> +}
> +
> +static int __guest_check_page_state_range(struct pkvm_hyp_vcpu *vcpu, u64 addr,
> +                                         u64 size, enum pkvm_page_state state)
> +{
> +       struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
> +       struct check_walk_data d = {
> +               .desired        = state,
> +               .get_page_state = guest_get_page_state,
> +       };
> +
> +       hyp_assert_lock_held(&vm->lock);
> +       return check_page_state_range(&vm->pgt, addr, size, &d);
> +}
> +
>  static int check_share(struct pkvm_mem_share *share)
>  {
>         const struct pkvm_mem_transition *tx = &share->tx;
> @@ -1343,3 +1364,52 @@ int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages)
>
>         return ret;
>  }
> +
> +int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu,
> +                           enum kvm_pgtable_prot prot)
> +{
> +       struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
> +       u64 phys = hyp_pfn_to_phys(pfn);
> +       u64 ipa = hyp_pfn_to_phys(gfn);
> +       struct hyp_page *page;
> +       int ret;
> +
> +       if (prot & ~KVM_PGTABLE_PROT_RWX)
> +               return -EINVAL;
> +
> +       ret = range_is_allowed_memory(phys, phys + PAGE_SIZE);
> +       if (ret)
> +               return ret;
> +
> +       host_lock_component();
> +       guest_lock_component(vm);
> +
> +       ret = __guest_check_page_state_range(vcpu, ipa, PAGE_SIZE, PKVM_NOPAGE);
> +       if (ret)
> +               goto unlock;
> +
> +       page = hyp_phys_to_page(phys);
> +       switch (page->host_state) {
> +       case PKVM_PAGE_OWNED:
> +               WARN_ON(__host_set_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_SHARED_OWNED));
> +               break;
> +       case PKVM_PAGE_SHARED_OWNED:
> +               /* Only host to np-guest multi-sharing is tolerated */

Initially I thought the comment was related to the warning below,
which confused me. Now I think what you're trying to say is that we'll
allow the share, and the (unrelated to the comment) warning is to
ensure that the PKVM_PAGE_SHARED_OWNED is consistent with the share
count.

I think what you should have here, which would work better with the
comment, is something like:

                /* Only host to np-guest multi-sharing is tolerated */
+               if (pkvm_hyp_vcpu_is_protected(vcpu))
+                       return -EPERM;

That would even make the comment unnecessary.


> +               WARN_ON(!page->host_share_guest_count);
> +               break;
> +       default:
> +               ret = -EPERM;
> +               goto unlock;
> +       }
> +
> +       WARN_ON(kvm_pgtable_stage2_map(&vm->pgt, ipa, PAGE_SIZE, phys,
> +                                      pkvm_mkstate(prot, PKVM_PAGE_SHARED_BORROWED),
> +                                      &vcpu->vcpu.arch.pkvm_memcache, 0));
> +       page->host_share_guest_count++;
> +
> +unlock:
> +       guest_unlock_component(vm);
> +       host_unlock_component();
> +
> +       return ret;
> +}
> diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> index d5c23449a64c..d6c61a5e7b6e 100644
> --- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
> +++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> @@ -795,6 +795,13 @@ int __pkvm_teardown_vm(pkvm_handle_t handle)
>         /* Push the metadata pages to the teardown memcache */
>         for (idx = 0; idx < hyp_vm->nr_vcpus; ++idx) {
>                 struct pkvm_hyp_vcpu *hyp_vcpu = hyp_vm->vcpus[idx];
> +               struct kvm_hyp_memcache *vcpu_mc = &hyp_vcpu->vcpu.arch.pkvm_memcache;
> +
> +               while (vcpu_mc->nr_pages) {
> +                       void *addr = pop_hyp_memcache(vcpu_mc, hyp_phys_to_virt);

nit: newline

Cheers,
/fuad



> +                       push_hyp_memcache(mc, addr, hyp_virt_to_phys);
> +                       unmap_donated_memory_noclear(addr, PAGE_SIZE);
> +               }
>
>                 teardown_donated_memory(mc, hyp_vcpu, sizeof(*hyp_vcpu));
>         }
> --
> 2.47.0.338.g60cca15819-goog
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 11/18] KVM: arm64: Introduce __pkvm_host_unshare_guest()
  2024-12-03 10:37 ` [PATCH v2 11/18] KVM: arm64: Introduce __pkvm_host_unshare_guest() Quentin Perret
@ 2024-12-10 14:41   ` Fuad Tabba
  2024-12-10 15:53     ` Quentin Perret
  0 siblings, 1 reply; 50+ messages in thread
From: Fuad Tabba @ 2024-12-10 14:41 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Vincent Donnefort,
	Sebastian Ene, linux-arm-kernel, kvmarm, linux-kernel

Hi Quentin,

On Tue, 3 Dec 2024 at 10:38, Quentin Perret <qperret@google.com> wrote:
>
> In preparation for letting the host unmap pages from non-protected
> guests, introduce a new hypercall implementing the host-unshare-guest
> transition.
>
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_asm.h              |  1 +
>  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  1 +
>  arch/arm64/kvm/hyp/include/nvhe/pkvm.h        |  5 ++
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c            | 24 +++++++
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 67 +++++++++++++++++++
>  5 files changed, 98 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 449337f5b2a3..0b6c4d325134 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -66,6 +66,7 @@ enum __kvm_host_smccc_func {
>         __KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp,
>         __KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
>         __KVM_HOST_SMCCC_FUNC___pkvm_host_share_guest,
> +       __KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_guest,
>         __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
>         __KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
>         __KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> index a7976e50f556..e528a42ed60e 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -40,6 +40,7 @@ int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
>  int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
>  int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
>  int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum kvm_pgtable_prot prot);
> +int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);

The parameters of share_guest and unshare_guest are quite different. I
think that the unshare makes more sense, that it uses the hyp_vm as
opposed to the hyp_vcpu. Still, I think that one of the two should
change.

>  bool addr_is_memory(phys_addr_t phys);
>  int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> index be52c5b15e21..5dfc9ece9aa5 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> @@ -64,6 +64,11 @@ static inline bool pkvm_hyp_vcpu_is_protected(struct pkvm_hyp_vcpu *hyp_vcpu)
>         return vcpu_is_protected(&hyp_vcpu->vcpu);
>  }
>
> +static inline bool pkvm_hyp_vm_is_protected(struct pkvm_hyp_vm *hyp_vm)
> +{
> +       return kvm_vm_is_protected(&hyp_vm->kvm);
> +}
> +
>  void pkvm_hyp_vm_table_init(void *tbl);
>
>  int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index d659462fbf5d..04a9053ae1d5 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -244,6 +244,29 @@ static void handle___pkvm_host_share_guest(struct kvm_cpu_context *host_ctxt)
>         cpu_reg(host_ctxt, 1) =  ret;
>  }
>
> +static void handle___pkvm_host_unshare_guest(struct kvm_cpu_context *host_ctxt)
> +{
> +       DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
> +       DECLARE_REG(u64, gfn, host_ctxt, 2);
> +       struct pkvm_hyp_vm *hyp_vm;
> +       int ret = -EINVAL;
> +
> +       if (!is_protected_kvm_enabled())
> +               goto out;
> +
> +       hyp_vm = get_pkvm_hyp_vm(handle);
> +       if (!hyp_vm)
> +               goto out;
> +       if (pkvm_hyp_vm_is_protected(hyp_vm))
> +               goto put_hyp_vm;

bikeshedding: is -EINVAL the best return value, or might -EPERM be
better if the VM is protected?

> +
> +       ret = __pkvm_host_unshare_guest(gfn, hyp_vm);
> +put_hyp_vm:
> +       put_pkvm_hyp_vm(hyp_vm);
> +out:
> +       cpu_reg(host_ctxt, 1) =  ret;
> +}
> +
>  static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt)
>  {
>         DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
> @@ -454,6 +477,7 @@ static const hcall_t host_hcall[] = {
>         HANDLE_FUNC(__pkvm_host_share_hyp),
>         HANDLE_FUNC(__pkvm_host_unshare_hyp),
>         HANDLE_FUNC(__pkvm_host_share_guest),
> +       HANDLE_FUNC(__pkvm_host_unshare_guest),
>         HANDLE_FUNC(__kvm_adjust_pc),
>         HANDLE_FUNC(__kvm_vcpu_run),
>         HANDLE_FUNC(__kvm_flush_vm_context),
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index a69d7212b64c..aa27a3e42e5e 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -1413,3 +1413,70 @@ int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu,
>
>         return ret;
>  }
> +
> +static int __check_host_unshare_guest(struct pkvm_hyp_vm *vm, u64 *__phys, u64 ipa)

nit: sometimes (in this and other patches) you use vm to refer to
pkvm_hyp_vm, and other times you use hyp_vm. Makes grepping/searching
a bit more tricky.

Cheers,
/fuad


> +{
> +       enum pkvm_page_state state;
> +       struct hyp_page *page;
> +       kvm_pte_t pte;
> +       u64 phys;
> +       s8 level;
> +       int ret;
> +
> +       ret = kvm_pgtable_get_leaf(&vm->pgt, ipa, &pte, &level);
> +       if (ret)
> +               return ret;
> +       if (level != KVM_PGTABLE_LAST_LEVEL)
> +               return -E2BIG;
> +       if (!kvm_pte_valid(pte))
> +               return -ENOENT;
> +
> +       state = guest_get_page_state(pte, ipa);
> +       if (state != PKVM_PAGE_SHARED_BORROWED)
> +               return -EPERM;
> +
> +       phys = kvm_pte_to_phys(pte);
> +       ret = range_is_allowed_memory(phys, phys + PAGE_SIZE);
> +       if (WARN_ON(ret))
> +               return ret;
> +
> +       page = hyp_phys_to_page(phys);
> +       if (page->host_state != PKVM_PAGE_SHARED_OWNED)
> +               return -EPERM;
> +       if (WARN_ON(!page->host_share_guest_count))
> +               return -EINVAL;
> +
> +       *__phys = phys;
> +
> +       return 0;
> +}
> +
> +int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm)
> +{
> +       u64 ipa = hyp_pfn_to_phys(gfn);
> +       struct hyp_page *page;
> +       u64 phys;
> +       int ret;
> +
> +       host_lock_component();
> +       guest_lock_component(hyp_vm);
> +
> +       ret = __check_host_unshare_guest(hyp_vm, &phys, ipa);
> +       if (ret)
> +               goto unlock;
> +
> +       ret = kvm_pgtable_stage2_unmap(&hyp_vm->pgt, ipa, PAGE_SIZE);
> +       if (ret)
> +               goto unlock;
> +
> +       page = hyp_phys_to_page(phys);
> +       page->host_share_guest_count--;
> +       if (!page->host_share_guest_count)
> +               WARN_ON(__host_set_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_OWNED));
> +
> +unlock:
> +       guest_unlock_component(hyp_vm);
> +       host_unlock_component();
> +
> +       return ret;
> +}
> --
> 2.47.0.338.g60cca15819-goog
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 12/18] KVM: arm64: Introduce __pkvm_host_relax_guest_perms()
  2024-12-03 10:37 ` [PATCH v2 12/18] KVM: arm64: Introduce __pkvm_host_relax_guest_perms() Quentin Perret
@ 2024-12-10 14:56   ` Fuad Tabba
  2024-12-11  8:57     ` Quentin Perret
  0 siblings, 1 reply; 50+ messages in thread
From: Fuad Tabba @ 2024-12-10 14:56 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Vincent Donnefort,
	Sebastian Ene, linux-arm-kernel, kvmarm, linux-kernel

Hi Quentin,

On Tue, 3 Dec 2024 at 10:38, Quentin Perret <qperret@google.com> wrote:
>
> Introduce a new hypercall allowing the host to relax the stage-2
> permissions of mappings in a non-protected guest page-table. It will be
> used later once we start allowing RO memslots and dirty logging.
>
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_asm.h              |  1 +
>  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  1 +
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c            | 20 ++++++++++++++++
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 23 +++++++++++++++++++
>  4 files changed, 45 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 0b6c4d325134..5d51933e44fb 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -67,6 +67,7 @@ enum __kvm_host_smccc_func {
>         __KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
>         __KVM_HOST_SMCCC_FUNC___pkvm_host_share_guest,
>         __KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_guest,
> +       __KVM_HOST_SMCCC_FUNC___pkvm_host_relax_guest_perms,
>         __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
>         __KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
>         __KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> index e528a42ed60e..db0dd83c2457 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -41,6 +41,7 @@ int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
>  int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
>  int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum kvm_pgtable_prot prot);
>  int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
> +int __pkvm_host_relax_guest_perms(u64 gfn, enum kvm_pgtable_prot prot, struct pkvm_hyp_vcpu *vcpu);

The parameters are the same as __pkvm_host_share_guest, but in a
different order. I looked ahead at later patches in the series, and similar
issues regarding parameter type and ordering, so I won't mention it
for the later patches.


>  bool addr_is_memory(phys_addr_t phys);
>  int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index 04a9053ae1d5..60dd56bbd743 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -267,6 +267,25 @@ static void handle___pkvm_host_unshare_guest(struct kvm_cpu_context *host_ctxt)
>         cpu_reg(host_ctxt, 1) =  ret;
>  }
>
> +static void handle___pkvm_host_relax_guest_perms(struct kvm_cpu_context *host_ctxt)
> +{
> +       DECLARE_REG(u64, gfn, host_ctxt, 1);
> +       DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 2);
> +       struct pkvm_hyp_vcpu *hyp_vcpu;
> +       int ret = -EINVAL;
> +
> +       if (!is_protected_kvm_enabled())
> +               goto out;
> +
> +       hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
> +       if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
> +               goto out;
> +
> +       ret = __pkvm_host_relax_guest_perms(gfn, prot, hyp_vcpu);
> +out:
> +       cpu_reg(host_ctxt, 1) = ret;
> +}
> +
>  static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt)
>  {
>         DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
> @@ -478,6 +497,7 @@ static const hcall_t host_hcall[] = {
>         HANDLE_FUNC(__pkvm_host_unshare_hyp),
>         HANDLE_FUNC(__pkvm_host_share_guest),
>         HANDLE_FUNC(__pkvm_host_unshare_guest),
> +       HANDLE_FUNC(__pkvm_host_relax_guest_perms),
>         HANDLE_FUNC(__kvm_adjust_pc),
>         HANDLE_FUNC(__kvm_vcpu_run),
>         HANDLE_FUNC(__kvm_flush_vm_context),
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index aa27a3e42e5e..d4b28e93e790 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -1480,3 +1480,26 @@ int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm)
>
>         return ret;
>  }
> +
> +int __pkvm_host_relax_guest_perms(u64 gfn, enum kvm_pgtable_prot prot, struct pkvm_hyp_vcpu *vcpu)
> +{
> +       struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
> +       u64 ipa = hyp_pfn_to_phys(gfn);
> +       u64 phys;
> +       int ret;
> +
> +       if ((prot & KVM_PGTABLE_PROT_RWX) != prot)
> +               return -EPERM;

Why not

+       if (prot & ~KVM_PGTABLE_PROT_RWX)

Simpler and consistent with similar checks in the file (e.g.,
__pkvm_host_share_guest)

Cheers,
/fuad


> +
> +       host_lock_component();
> +       guest_lock_component(vm);
> +
> +       ret = __check_host_unshare_guest(vm, &phys, ipa);
> +       if (!ret)
> +               ret = kvm_pgtable_stage2_relax_perms(&vm->pgt, ipa, prot, 0);
> +
> +       guest_unlock_component(vm);
> +       host_unlock_component();
> +
> +       return ret;
> +}
> --
> 2.47.0.338.g60cca15819-goog
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 13/18] KVM: arm64: Introduce __pkvm_host_wrprotect_guest()
  2024-12-03 10:37 ` [PATCH v2 13/18] KVM: arm64: Introduce __pkvm_host_wrprotect_guest() Quentin Perret
@ 2024-12-10 15:06   ` Fuad Tabba
  2024-12-10 19:38     ` Quentin Perret
  0 siblings, 1 reply; 50+ messages in thread
From: Fuad Tabba @ 2024-12-10 15:06 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Vincent Donnefort,
	Sebastian Ene, linux-arm-kernel, kvmarm, linux-kernel

Hi Quentin,

On Tue, 3 Dec 2024 at 10:38, Quentin Perret <qperret@google.com> wrote:
>
> Introduce a new hypercall to remove the write permission from a
> non-protected guest stage-2 mapping. This will be used for e.g. enabling
> dirty logging.
>
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_asm.h              |  1 +
>  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  1 +
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c            | 24 +++++++++++++++++++
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 19 +++++++++++++++
>  4 files changed, 45 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 5d51933e44fb..4d7d20ea03df 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -68,6 +68,7 @@ enum __kvm_host_smccc_func {
>         __KVM_HOST_SMCCC_FUNC___pkvm_host_share_guest,
>         __KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_guest,
>         __KVM_HOST_SMCCC_FUNC___pkvm_host_relax_guest_perms,
> +       __KVM_HOST_SMCCC_FUNC___pkvm_host_wrprotect_guest,
>         __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
>         __KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
>         __KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> index db0dd83c2457..8658b5932473 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -42,6 +42,7 @@ int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
>  int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum kvm_pgtable_prot prot);
>  int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
>  int __pkvm_host_relax_guest_perms(u64 gfn, enum kvm_pgtable_prot prot, struct pkvm_hyp_vcpu *vcpu);
> +int __pkvm_host_wrprotect_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
>
>  bool addr_is_memory(phys_addr_t phys);
>  int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index 60dd56bbd743..3feaf2119e51 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -286,6 +286,29 @@ static void handle___pkvm_host_relax_guest_perms(struct kvm_cpu_context *host_ct
>         cpu_reg(host_ctxt, 1) = ret;
>  }
>
> +static void handle___pkvm_host_wrprotect_guest(struct kvm_cpu_context *host_ctxt)
> +{
> +       DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
> +       DECLARE_REG(u64, gfn, host_ctxt, 2);
> +       struct pkvm_hyp_vm *hyp_vm;
> +       int ret = -EINVAL;
> +
> +       if (!is_protected_kvm_enabled())
> +               goto out;
> +
> +       hyp_vm = get_pkvm_hyp_vm(handle);
> +       if (!hyp_vm)
> +               goto out;
> +       if (pkvm_hyp_vm_is_protected(hyp_vm))
> +               goto put_hyp_vm;

These checks are (unsurprisingly) the same for all these functions.
Does it make sense to have a helper do these checks?

Cheers,
/fuad

> +       ret = __pkvm_host_wrprotect_guest(gfn, hyp_vm);
> +put_hyp_vm:
> +       put_pkvm_hyp_vm(hyp_vm);
> +out:
> +       cpu_reg(host_ctxt, 1) = ret;
> +}
> +
>  static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt)
>  {
>         DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
> @@ -498,6 +521,7 @@ static const hcall_t host_hcall[] = {
>         HANDLE_FUNC(__pkvm_host_share_guest),
>         HANDLE_FUNC(__pkvm_host_unshare_guest),
>         HANDLE_FUNC(__pkvm_host_relax_guest_perms),
> +       HANDLE_FUNC(__pkvm_host_wrprotect_guest),
>         HANDLE_FUNC(__kvm_adjust_pc),
>         HANDLE_FUNC(__kvm_vcpu_run),
>         HANDLE_FUNC(__kvm_flush_vm_context),
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index d4b28e93e790..89312d7cde2a 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -1503,3 +1503,22 @@ int __pkvm_host_relax_guest_perms(u64 gfn, enum kvm_pgtable_prot prot, struct pk
>
>         return ret;
>  }
> +
> +int __pkvm_host_wrprotect_guest(u64 gfn, struct pkvm_hyp_vm *vm)
> +{
> +       u64 ipa = hyp_pfn_to_phys(gfn);
> +       u64 phys;
> +       int ret;
> +
> +       host_lock_component();
> +       guest_lock_component(vm);
> +
> +       ret = __check_host_unshare_guest(vm, &phys, ipa);
> +       if (!ret)
> +               ret = kvm_pgtable_stage2_wrprotect(&vm->pgt, ipa, PAGE_SIZE);
> +
> +       guest_unlock_component(vm);
> +       host_unlock_component();
> +
> +       return ret;
> +}
> --
> 2.47.0.338.g60cca15819-goog
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 14/18] KVM: arm64: Introduce __pkvm_host_test_clear_young_guest()
  2024-12-03 10:37 ` [PATCH v2 14/18] KVM: arm64: Introduce __pkvm_host_test_clear_young_guest() Quentin Perret
@ 2024-12-10 15:11   ` Fuad Tabba
  2024-12-10 19:39     ` Quentin Perret
  0 siblings, 1 reply; 50+ messages in thread
From: Fuad Tabba @ 2024-12-10 15:11 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Vincent Donnefort,
	Sebastian Ene, linux-arm-kernel, kvmarm, linux-kernel

Hi Quentin,

On Tue, 3 Dec 2024 at 10:38, Quentin Perret <qperret@google.com> wrote:
>
> Plumb the kvm_stage2_test_clear_young() callback into pKVM for
> non-protected guest. It will be later be called from MMU notifiers.
>
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_asm.h              |  1 +
>  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  1 +
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c            | 25 +++++++++++++++++++
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 19 ++++++++++++++
>  4 files changed, 46 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 4d7d20ea03df..cb676017d591 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -69,6 +69,7 @@ enum __kvm_host_smccc_func {
>         __KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_guest,
>         __KVM_HOST_SMCCC_FUNC___pkvm_host_relax_guest_perms,
>         __KVM_HOST_SMCCC_FUNC___pkvm_host_wrprotect_guest,
> +       __KVM_HOST_SMCCC_FUNC___pkvm_host_test_clear_young_guest,
>         __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
>         __KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
>         __KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> index 8658b5932473..554ce31882e6 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -43,6 +43,7 @@ int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum k
>  int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
>  int __pkvm_host_relax_guest_perms(u64 gfn, enum kvm_pgtable_prot prot, struct pkvm_hyp_vcpu *vcpu);
>  int __pkvm_host_wrprotect_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
> +int __pkvm_host_test_clear_young_guest(u64 gfn, bool mkold, struct pkvm_hyp_vm *vm);

While I'm piling on the function names/parameters, some functions have
_guest as a postfix at the end (e.g., this one), others have it in the
middle (__pkvm_host_relax_guest_perms). I guess
__pkvm_host_relax_guest_perms is the odd one out. Could you rename it?

Cheers,
/fuad


>  bool addr_is_memory(phys_addr_t phys);
>  int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index 3feaf2119e51..67cb6e284180 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -309,6 +309,30 @@ static void handle___pkvm_host_wrprotect_guest(struct kvm_cpu_context *host_ctxt
>         cpu_reg(host_ctxt, 1) = ret;
>  }
>
> +static void handle___pkvm_host_test_clear_young_guest(struct kvm_cpu_context *host_ctxt)
> +{
> +       DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
> +       DECLARE_REG(u64, gfn, host_ctxt, 2);
> +       DECLARE_REG(bool, mkold, host_ctxt, 3);
> +       struct pkvm_hyp_vm *hyp_vm;
> +       int ret = -EINVAL;
> +
> +       if (!is_protected_kvm_enabled())
> +               goto out;
> +
> +       hyp_vm = get_pkvm_hyp_vm(handle);
> +       if (!hyp_vm)
> +               goto out;
> +       if (pkvm_hyp_vm_is_protected(hyp_vm))
> +               goto put_hyp_vm;
> +
> +       ret = __pkvm_host_test_clear_young_guest(gfn, mkold, hyp_vm);
> +put_hyp_vm:
> +       put_pkvm_hyp_vm(hyp_vm);
> +out:
> +       cpu_reg(host_ctxt, 1) = ret;
> +}
> +
>  static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt)
>  {
>         DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
> @@ -522,6 +546,7 @@ static const hcall_t host_hcall[] = {
>         HANDLE_FUNC(__pkvm_host_unshare_guest),
>         HANDLE_FUNC(__pkvm_host_relax_guest_perms),
>         HANDLE_FUNC(__pkvm_host_wrprotect_guest),
> +       HANDLE_FUNC(__pkvm_host_test_clear_young_guest),
>         HANDLE_FUNC(__kvm_adjust_pc),
>         HANDLE_FUNC(__kvm_vcpu_run),
>         HANDLE_FUNC(__kvm_flush_vm_context),
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 89312d7cde2a..0e064a7ed7c4 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -1522,3 +1522,22 @@ int __pkvm_host_wrprotect_guest(u64 gfn, struct pkvm_hyp_vm *vm)
>
>         return ret;
>  }
> +
> +int __pkvm_host_test_clear_young_guest(u64 gfn, bool mkold, struct pkvm_hyp_vm *vm)
> +{
> +       u64 ipa = hyp_pfn_to_phys(gfn);
> +       u64 phys;
> +       int ret;
> +
> +       host_lock_component();
> +       guest_lock_component(vm);
> +
> +       ret = __check_host_unshare_guest(vm, &phys, ipa);
> +       if (!ret)
> +               ret = kvm_pgtable_stage2_test_clear_young(&vm->pgt, ipa, PAGE_SIZE, mkold);
> +
> +       guest_unlock_component(vm);
> +       host_unlock_component();
> +
> +       return ret;
> +}
> --
> 2.47.0.338.g60cca15819-goog
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 15/18] KVM: arm64: Introduce __pkvm_host_mkyoung_guest()
  2024-12-03 10:37 ` [PATCH v2 15/18] KVM: arm64: Introduce __pkvm_host_mkyoung_guest() Quentin Perret
@ 2024-12-10 15:14   ` Fuad Tabba
  2024-12-10 19:46     ` Quentin Perret
  0 siblings, 1 reply; 50+ messages in thread
From: Fuad Tabba @ 2024-12-10 15:14 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Vincent Donnefort,
	Sebastian Ene, linux-arm-kernel, kvmarm, linux-kernel

Hi Quentin,

On Tue, 3 Dec 2024 at 10:38, Quentin Perret <qperret@google.com> wrote:
>
> Plumb the kvm_pgtable_stage2_mkyoung() callback into pKVM for
> non-protected guests. It will be called later from the fault handling
> path.
>
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_asm.h              |  1 +
>  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  1 +
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c            | 19 ++++++++++++++++++
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 20 +++++++++++++++++++
>  4 files changed, 41 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index cb676017d591..6178e12a0dbc 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -70,6 +70,7 @@ enum __kvm_host_smccc_func {
>         __KVM_HOST_SMCCC_FUNC___pkvm_host_relax_guest_perms,
>         __KVM_HOST_SMCCC_FUNC___pkvm_host_wrprotect_guest,
>         __KVM_HOST_SMCCC_FUNC___pkvm_host_test_clear_young_guest,
> +       __KVM_HOST_SMCCC_FUNC___pkvm_host_mkyoung_guest,
>         __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
>         __KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
>         __KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> index 554ce31882e6..3ae0c3ecff48 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -44,6 +44,7 @@ int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
>  int __pkvm_host_relax_guest_perms(u64 gfn, enum kvm_pgtable_prot prot, struct pkvm_hyp_vcpu *vcpu);
>  int __pkvm_host_wrprotect_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
>  int __pkvm_host_test_clear_young_guest(u64 gfn, bool mkold, struct pkvm_hyp_vm *vm);
> +int __pkvm_host_mkyoung_guest(u64 gfn, struct pkvm_hyp_vcpu *vcpu);
>
>  bool addr_is_memory(phys_addr_t phys);
>  int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index 67cb6e284180..de0012a75827 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -333,6 +333,24 @@ static void handle___pkvm_host_test_clear_young_guest(struct kvm_cpu_context *ho
>         cpu_reg(host_ctxt, 1) = ret;
>  }
>
> +static void handle___pkvm_host_mkyoung_guest(struct kvm_cpu_context *host_ctxt)
> +{
> +       DECLARE_REG(u64, gfn, host_ctxt, 1);
> +       struct pkvm_hyp_vcpu *hyp_vcpu;
> +       int ret = -EINVAL;
> +
> +       if (!is_protected_kvm_enabled())
> +               goto out;
> +
> +       hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
> +       if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
> +               goto out;
> +
> +       ret = __pkvm_host_mkyoung_guest(gfn, hyp_vcpu);
> +out:
> +       cpu_reg(host_ctxt, 1) =  ret;
> +}
> +
>  static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt)
>  {
>         DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
> @@ -547,6 +565,7 @@ static const hcall_t host_hcall[] = {
>         HANDLE_FUNC(__pkvm_host_relax_guest_perms),
>         HANDLE_FUNC(__pkvm_host_wrprotect_guest),
>         HANDLE_FUNC(__pkvm_host_test_clear_young_guest),
> +       HANDLE_FUNC(__pkvm_host_mkyoung_guest),
>         HANDLE_FUNC(__kvm_adjust_pc),
>         HANDLE_FUNC(__kvm_vcpu_run),
>         HANDLE_FUNC(__kvm_flush_vm_context),
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 0e064a7ed7c4..7605bd7f80b5 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -1541,3 +1541,23 @@ int __pkvm_host_test_clear_young_guest(u64 gfn, bool mkold, struct pkvm_hyp_vm *
>
>         return ret;
>  }
> +
> +int __pkvm_host_mkyoung_guest(u64 gfn, struct pkvm_hyp_vcpu *vcpu)
> +{
> +       struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
> +       u64 ipa = hyp_pfn_to_phys(gfn);
> +       u64 phys;
> +       int ret;
> +
> +       host_lock_component();
> +       guest_lock_component(vm);
> +
> +       ret = __check_host_unshare_guest(vm, &phys, ipa);

While I'm bikeshedding some more, does the name
__check_host_unshare_guest() make sense? Should it be something like
__check_host_changeperm_guest(), or something along those lines? (feel
free to ignore this :) )

Thanks,
/fuad

> +       if (!ret)
> +               kvm_pgtable_stage2_mkyoung(&vm->pgt, ipa, 0);
> +
> +       guest_unlock_component(vm);
> +       host_unlock_component();
> +
> +       return ret;
> +}
> --
> 2.47.0.338.g60cca15819-goog
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 01/18] KVM: arm64: Change the layout of enum pkvm_page_state
  2024-12-10 12:59   ` Fuad Tabba
@ 2024-12-10 15:15     ` Quentin Perret
  0 siblings, 0 replies; 50+ messages in thread
From: Quentin Perret @ 2024-12-10 15:15 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Vincent Donnefort,
	Sebastian Ene, linux-arm-kernel, kvmarm, linux-kernel

Hey Fuad,

On Tuesday 10 Dec 2024 at 12:59:44 (+0000), Fuad Tabba wrote:
> Hi Quentin,
> 
> On Tue, 3 Dec 2024 at 10:37, Quentin Perret <qperret@google.com> wrote:
> >
> > The 'concrete' (a.k.a non-meta) page states are currently encoded using
> > software bits in PTEs. For performance reasons, the abstract
> > pkvm_page_state enum uses the same bits to encode these states as that
> > makes conversions from and to PTEs easy.
> >
> > In order to prepare the ground for moving the 'concrete' state storage
> > to the hyp vmemmap, re-arrange the enum to use bits 0 and 1 for this
> > purpose.
> >
> > No functional changes intended.
> >
> > Signed-off-by: Quentin Perret <qperret@google.com>
> > ---
> >  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 17 ++++++++++-------
> >  1 file changed, 10 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > index 0972faccc2af..ca3177481b78 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > @@ -24,25 +24,28 @@
> >   */
> >  enum pkvm_page_state {
> >         PKVM_PAGE_OWNED                 = 0ULL,
> > -       PKVM_PAGE_SHARED_OWNED          = KVM_PGTABLE_PROT_SW0,
> > -       PKVM_PAGE_SHARED_BORROWED       = KVM_PGTABLE_PROT_SW1,
> > -       __PKVM_PAGE_RESERVED            = KVM_PGTABLE_PROT_SW0 |
> > -                                         KVM_PGTABLE_PROT_SW1,
> > +       PKVM_PAGE_SHARED_OWNED          = BIT(0),
> > +       PKVM_PAGE_SHARED_BORROWED       = BIT(1),
> > +       __PKVM_PAGE_RESERVED            = BIT(0) | BIT(1),
> >
> >         /* Meta-states which aren't encoded directly in the PTE's SW bits */
> > -       PKVM_NOPAGE,
> > +       PKVM_NOPAGE                     = BIT(2),
> >  };
> > +#define PKVM_PAGE_META_STATES_MASK     (~(BIT(0) | BIT(1)))
> >
> >  #define PKVM_PAGE_STATE_PROT_MASK      (KVM_PGTABLE_PROT_SW0 | KVM_PGTABLE_PROT_SW1)
> >  static inline enum kvm_pgtable_prot pkvm_mkstate(enum kvm_pgtable_prot prot,
> >                                                  enum pkvm_page_state state)
> >  {
> > -       return (prot & ~PKVM_PAGE_STATE_PROT_MASK) | state;
> > +       BUG_ON(state & PKVM_PAGE_META_STATES_MASK);
> 
> This is a slight change in functionality, having a BUG_ON instead of
> just masking out illegal states. Is it necessary?

Yep, this is arguably a bit zealous. Passing e.g. PKVM_NOPAGE to
pkvm_mkstate() would be properly bogus, so having a WARN_ON() or
BUG_ON() in there is still a good thing, but it should be done in a
separate patch.

I'll rework in v3.

> > +       prot &= ~PKVM_PAGE_STATE_PROT_MASK;
> > +       prot |= FIELD_PREP(PKVM_PAGE_STATE_PROT_MASK, state);
> > +       return prot;
> >  }
> >
> >  static inline enum pkvm_page_state pkvm_getstate(enum kvm_pgtable_prot prot)
> >  {
> > -       return prot & PKVM_PAGE_STATE_PROT_MASK;
> > +       return FIELD_GET(PKVM_PAGE_STATE_PROT_MASK, prot);
> >  }
> >
> >  struct host_mmu {
> > --
> > 2.47.0.338.g60cca15819-goog
> >


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 16/18] KVM: arm64: Introduce __pkvm_tlb_flush_vmid()
  2024-12-03 10:37 ` [PATCH v2 16/18] KVM: arm64: Introduce __pkvm_tlb_flush_vmid() Quentin Perret
@ 2024-12-10 15:23   ` Fuad Tabba
  2024-12-11 10:03     ` Quentin Perret
  0 siblings, 1 reply; 50+ messages in thread
From: Fuad Tabba @ 2024-12-10 15:23 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Vincent Donnefort,
	Sebastian Ene, linux-arm-kernel, kvmarm, linux-kernel

Hi Quentin,

On Tue, 3 Dec 2024 at 10:38, Quentin Perret <qperret@google.com> wrote:
>
> Introduce a new hypercall to flush the TLBs of non-protected guests. The
> host kernel will be responsible for issuing this hypercall after changing
> stage-2 permissions using the __pkvm_host_relax_guest_perms() or
> __pkvm_host_wrprotect_guest() paths. This is left under the host's
> responsibility for performance reasons.
>
> Note however that the TLB maintenance for all *unmap* operations still
> remains entirely under the hypervisor's responsibility for security
> reasons -- an unmapped page may be donated to another entity, so a stale
> TLB entry could be used to leak private data.
>
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_asm.h   |  1 +
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c | 17 +++++++++++++++++
>  2 files changed, 18 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 6178e12a0dbc..df6237d0459c 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -87,6 +87,7 @@ enum __kvm_host_smccc_func {
>         __KVM_HOST_SMCCC_FUNC___pkvm_teardown_vm,
>         __KVM_HOST_SMCCC_FUNC___pkvm_vcpu_load,
>         __KVM_HOST_SMCCC_FUNC___pkvm_vcpu_put,
> +       __KVM_HOST_SMCCC_FUNC___pkvm_tlb_flush_vmid,
>  };
>
>  #define DECLARE_KVM_VHE_SYM(sym)       extern char sym[]
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index de0012a75827..219d7fb850ec 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -398,6 +398,22 @@ static void handle___kvm_tlb_flush_vmid(struct kvm_cpu_context *host_ctxt)
>         __kvm_tlb_flush_vmid(kern_hyp_va(mmu));
>  }
>
> +static void handle___pkvm_tlb_flush_vmid(struct kvm_cpu_context *host_ctxt)
> +{
> +       DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
> +       struct pkvm_hyp_vm *hyp_vm;
> +
> +       if (!is_protected_kvm_enabled())
> +               return;
> +
> +       hyp_vm = get_pkvm_hyp_vm(handle);
> +       if (!hyp_vm)
> +               return;
> +
> +       __kvm_tlb_flush_vmid(&hyp_vm->kvm.arch.mmu);
> +       put_pkvm_hyp_vm(hyp_vm);
> +}

Since this is practically the same as kvm_tlb_flush_vmid(), does it
make sense to modify that instead (handle___kvm_tlb_flush_vmid()) to
do the right thing depending on whether pkvm is enabled? Thinking as
well for the future in case we want to support the rest of the
kvm_tlb_flush_vmid_*().

Cheers,
/fuad

> +
>  static void handle___kvm_flush_cpu_context(struct kvm_cpu_context *host_ctxt)
>  {
>         DECLARE_REG(struct kvm_s2_mmu *, mmu, host_ctxt, 1);
> @@ -582,6 +598,7 @@ static const hcall_t host_hcall[] = {
>         HANDLE_FUNC(__pkvm_teardown_vm),
>         HANDLE_FUNC(__pkvm_vcpu_load),
>         HANDLE_FUNC(__pkvm_vcpu_put),
> +       HANDLE_FUNC(__pkvm_tlb_flush_vmid),
>  };
>
>  static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
> --
> 2.47.0.338.g60cca15819-goog
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 04/18] KVM: arm64: Move host page ownership tracking to the hyp vmemmap
  2024-12-10 13:02   ` Fuad Tabba
@ 2024-12-10 15:29     ` Quentin Perret
  2024-12-10 15:46       ` Fuad Tabba
  0 siblings, 1 reply; 50+ messages in thread
From: Quentin Perret @ 2024-12-10 15:29 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Vincent Donnefort,
	Sebastian Ene, linux-arm-kernel, kvmarm, linux-kernel

Hey Fuad,

On Tuesday 10 Dec 2024 at 13:02:45 (+0000), Fuad Tabba wrote:
> Hi Quentin,
> 
> On Tue, 3 Dec 2024 at 10:37, Quentin Perret <qperret@google.com> wrote:
> >
> > We currently store part of the page-tracking state in PTE software bits
> > for the host, guests and the hypervisor. This is sub-optimal when e.g.
> > sharing pages as this forces to break block mappings purely to support
> > this software tracking. This causes an unnecessarily fragmented stage-2
> > page-table for the host in particular when it shares pages with Secure,
> > which can lead to measurable regressions. Moreover, having this state
> > stored in the page-table forces us to do multiple costly walks on the
> > page transition path, hence causing overhead.
> >
> > In order to work around these problems, move the host-side page-tracking
> > logic from SW bits in its stage-2 PTEs to the hypervisor's vmemmap.
> >
> > Signed-off-by: Quentin Perret <qperret@google.com>
> > ---
> >  arch/arm64/kvm/hyp/include/nvhe/memory.h |  6 +-
> >  arch/arm64/kvm/hyp/nvhe/mem_protect.c    | 94 ++++++++++++++++--------
> >  arch/arm64/kvm/hyp/nvhe/setup.c          |  7 +-
> >  3 files changed, 71 insertions(+), 36 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
> > index 88cb8ff9e769..08f3a0416d4c 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
> > @@ -8,7 +8,7 @@
> >  #include <linux/types.h>
> >
> >  /*
> > - * SW bits 0-1 are reserved to track the memory ownership state of each page:
> > + * Bits 0-1 are reserved to track the memory ownership state of each page:
> >   *   00: The page is owned exclusively by the page-table owner.
> >   *   01: The page is owned by the page-table owner, but is shared
> >   *       with another entity.
> 
> Not shown in this patch, but a couple of lines below, you might want
> to update the comment on PKVM_NOPAGE to fix the reference to "PTE's SW
> bits":

I actually think the comment is still correct -- PKVM_NOPAGE never goes
in the software bits, with or without this patch, so I figured we could
leave it as-is. But happy to reword if you have a good idea :)

> > /* Meta-states which aren't encoded directly in the PTE's SW bits */
> > PKVM_NOPAGE = BIT(2),
> 
> > @@ -44,7 +44,9 @@ static inline enum pkvm_page_state pkvm_getstate(enum kvm_pgtable_prot prot)
> >  struct hyp_page {
> >         u16 refcount;
> >         u8 order;
> > -       u8 reserved;
> > +
> > +       /* Host (non-meta) state. Guarded by the host stage-2 lock. */
> > +       enum pkvm_page_state host_state : 8;
> >  };
> >
> >  extern u64 __hyp_vmemmap;
> > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > index caba3e4bd09e..1595081c4f6b 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > @@ -201,8 +201,8 @@ static void *guest_s2_zalloc_page(void *mc)
> >
> >         memset(addr, 0, PAGE_SIZE);
> >         p = hyp_virt_to_page(addr);
> > -       memset(p, 0, sizeof(*p));
> >         p->refcount = 1;
> > +       p->order = 0;
> >
> >         return addr;
> >  }
> > @@ -268,6 +268,7 @@ int kvm_guest_prepare_stage2(struct pkvm_hyp_vm *vm, void *pgd)
> >
> >  void reclaim_guest_pages(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc)
> >  {
> > +       struct hyp_page *page;
> >         void *addr;
> >
> >         /* Dump all pgtable pages in the hyp_pool */
> > @@ -279,7 +280,9 @@ void reclaim_guest_pages(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc)
> >         /* Drain the hyp_pool into the memcache */
> >         addr = hyp_alloc_pages(&vm->pool, 0);
> >         while (addr) {
> > -               memset(hyp_virt_to_page(addr), 0, sizeof(struct hyp_page));
> > +               page = hyp_virt_to_page(addr);
> > +               page->refcount = 0;
> > +               page->order = 0;
> >                 push_hyp_memcache(mc, addr, hyp_virt_to_phys);
> >                 WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(addr), 1));
> >                 addr = hyp_alloc_pages(&vm->pool, 0);
> > @@ -382,19 +385,25 @@ bool addr_is_memory(phys_addr_t phys)
> >         return !!find_mem_range(phys, &range);
> >  }
> >
> > -static bool addr_is_allowed_memory(phys_addr_t phys)
> > +static bool is_in_mem_range(u64 addr, struct kvm_mem_range *range)
> > +{
> > +       return range->start <= addr && addr < range->end;
> > +}
> > +
> > +static int range_is_allowed_memory(u64 start, u64 end)
> 
> The name of this function "range_*is*_..." implies that it returns a
> boolean, which other functions in this file (and patch) with similar
> names do, but it returns an error instead. Maybe
> check_range_allowed_memory)?

Ack, I'll rename in v3.

> >  {
> >         struct memblock_region *reg;
> >         struct kvm_mem_range range;
> >
> > -       reg = find_mem_range(phys, &range);
> > +       /* Can't check the state of both MMIO and memory regions at once */
> 
> I don't understand this comment in relation to the code. Could you
> explain it to me please?

find_mem_range() will iterate the list of memblocks to find the 'range'
in which @start falls. That might either be in a memblock (so @addr is
memory, and @reg != NULL) or outside of one (so @addr is mmio, and
@reg == NULL). The check right after ensures that @end is in the same
PA range as @start. IOW, this checks that [start, end[ doesn't overlap
memory and MMIO, because the following logic wouldn't work for a mixed
case like that.

> > +       reg = find_mem_range(start, &range);
> > +       if (!is_in_mem_range(end - 1, &range))
> > +               return -EINVAL;
> >
> > -       return reg && !(reg->flags & MEMBLOCK_NOMAP);
> > -}
> > +       if (!reg || reg->flags & MEMBLOCK_NOMAP)
> > +               return -EPERM;
> >
> > -static bool is_in_mem_range(u64 addr, struct kvm_mem_range *range)
> > -{
> > -       return range->start <= addr && addr < range->end;
> > +       return 0;
> >  }
> >
> >  static bool range_is_memory(u64 start, u64 end)
> > @@ -454,8 +463,11 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
> >         if (kvm_pte_valid(pte))
> >                 return -EAGAIN;
> >
> > -       if (pte)
> > +       if (pte) {
> > +               WARN_ON(addr_is_memory(addr) &&
> > +                       !(hyp_phys_to_page(addr)->host_state & PKVM_NOPAGE));
> 
> nit: since the host state is now an enum, should this just be an
> equality check rather than an &? This makes it consistent with other
> checks of pkvm_page_state in this patch too.

We don't currently have a state that is additive to PKVM_NOPAGE, so no
objection from me.

> >                 return -EPERM;
> > +       }
> >
> >         do {
> >                 u64 granule = kvm_granule_size(level);
> > @@ -477,10 +489,29 @@ int host_stage2_idmap_locked(phys_addr_t addr, u64 size,
> >         return host_stage2_try(__host_stage2_idmap, addr, addr + size, prot);
> >  }
> >
> > +static void __host_update_page_state(phys_addr_t addr, u64 size, enum pkvm_page_state state)
> > +{
> > +       phys_addr_t end = addr + size;
> 
> nit: newline
> 
> > +       for (; addr < end; addr += PAGE_SIZE)
> > +               hyp_phys_to_page(addr)->host_state = state;
> > +}
> > +
> >  int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
> >  {
> > -       return host_stage2_try(kvm_pgtable_stage2_set_owner, &host_mmu.pgt,
> > -                              addr, size, &host_s2_pool, owner_id);
> > +       int ret;
> > +
> > +       ret = host_stage2_try(kvm_pgtable_stage2_set_owner, &host_mmu.pgt,
> > +                             addr, size, &host_s2_pool, owner_id);
> > +       if (ret || !addr_is_memory(addr))
> > +               return ret;
> 
> Can hyp set an owner for an address that isn't memory? Trying to
> understand why we need to update the host stage2 pagetable but not the
> hypervisor's vmemmap in that case.

I think the answer is not currently, but we will when we'll have to e.g.
donate IOMMU registers to EL2 and things of that nature. Note that this
does require an extension to __host_check_page_state_range() to go query
the page-table 'the old way' for MMIO addresses, though that isn't done
in this series. If you think strongly that this is confusing, I'm happy
to drop that check and we'll add it back with the IOMMU series or
something like that.

> > +
> > +       /* Don't forget to update the vmemmap tracking for the host */
> > +       if (owner_id == PKVM_ID_HOST)
> > +               __host_update_page_state(addr, size, PKVM_PAGE_OWNED);
> > +       else
> > +               __host_update_page_state(addr, size, PKVM_NOPAGE);
> > +
> > +       return 0;
> >  }
> >
> >  static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot prot)
> > @@ -604,35 +635,38 @@ static int check_page_state_range(struct kvm_pgtable *pgt, u64 addr, u64 size,
> >         return kvm_pgtable_walk(pgt, addr, size, &walker);
> >  }
> >
> > -static enum pkvm_page_state host_get_page_state(kvm_pte_t pte, u64 addr)
> > -{
> > -       if (!addr_is_allowed_memory(addr))
> > -               return PKVM_NOPAGE;
> > -
> > -       if (!kvm_pte_valid(pte) && pte)
> > -               return PKVM_NOPAGE;
> > -
> > -       return pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte));
> > -}
> > -
> >  static int __host_check_page_state_range(u64 addr, u64 size,
> >                                          enum pkvm_page_state state)
> >  {
> > -       struct check_walk_data d = {
> > -               .desired        = state,
> > -               .get_page_state = host_get_page_state,
> > -       };
> > +       u64 end = addr + size;
> > +       int ret;
> > +
> > +       ret = range_is_allowed_memory(addr, end);
> > +       if (ret)
> > +               return ret;
> >
> >         hyp_assert_lock_held(&host_mmu.lock);
> > -       return check_page_state_range(&host_mmu.pgt, addr, size, &d);
> > +       for (; addr < end; addr += PAGE_SIZE) {
> > +               if (hyp_phys_to_page(addr)->host_state != state)
> > +                       return -EPERM;
> > +       }
> > +
> > +       return 0;
> >  }
> >
> >  static int __host_set_page_state_range(u64 addr, u64 size,
> >                                        enum pkvm_page_state state)
> >  {
> > -       enum kvm_pgtable_prot prot = pkvm_mkstate(PKVM_HOST_MEM_PROT, state);
> > +       if (hyp_phys_to_page(addr)->host_state & PKVM_NOPAGE) {
> 
> Same nit as above regarding checking for PKVM_NOPAGE
> 
> Cheers,
> /fuad
> 
> 
> > +               int ret = host_stage2_idmap_locked(addr, size, PKVM_HOST_MEM_PROT);
> >
> > -       return host_stage2_idmap_locked(addr, size, prot);
> > +               if (ret)
> > +                       return ret;
> > +       }
> > +
> > +       __host_update_page_state(addr, size, state);
> > +
> > +       return 0;
> >  }
> >
> >  static int host_request_owned_transition(u64 *completer_addr,
> > diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> > index cbdd18cd3f98..7e04d1c2a03d 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> > @@ -180,7 +180,6 @@ static void hpool_put_page(void *addr)
> >  static int fix_host_ownership_walker(const struct kvm_pgtable_visit_ctx *ctx,
> >                                      enum kvm_pgtable_walk_flags visit)
> >  {
> > -       enum kvm_pgtable_prot prot;
> >         enum pkvm_page_state state;
> >         phys_addr_t phys;
> >
> > @@ -203,16 +202,16 @@ static int fix_host_ownership_walker(const struct kvm_pgtable_visit_ctx *ctx,
> >         case PKVM_PAGE_OWNED:
> >                 return host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_HYP);
> >         case PKVM_PAGE_SHARED_OWNED:
> > -               prot = pkvm_mkstate(PKVM_HOST_MEM_PROT, PKVM_PAGE_SHARED_BORROWED);
> > +               hyp_phys_to_page(phys)->host_state = PKVM_PAGE_SHARED_BORROWED;
> >                 break;
> >         case PKVM_PAGE_SHARED_BORROWED:
> > -               prot = pkvm_mkstate(PKVM_HOST_MEM_PROT, PKVM_PAGE_SHARED_OWNED);
> > +               hyp_phys_to_page(phys)->host_state = PKVM_PAGE_SHARED_OWNED;
> >                 break;
> >         default:
> >                 return -EINVAL;
> >         }
> >
> > -       return host_stage2_idmap_locked(phys, PAGE_SIZE, prot);
> > +       return 0;
> >  }
> >
> >  static int fix_hyp_pgtable_refcnt_walker(const struct kvm_pgtable_visit_ctx *ctx,
> > --
> > 2.47.0.338.g60cca15819-goog
> >


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 10/18] KVM: arm64: Introduce __pkvm_host_share_guest()
  2024-12-10 13:58   ` Fuad Tabba
@ 2024-12-10 15:41     ` Quentin Perret
  2024-12-10 15:51       ` Fuad Tabba
  0 siblings, 1 reply; 50+ messages in thread
From: Quentin Perret @ 2024-12-10 15:41 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Vincent Donnefort,
	Sebastian Ene, linux-arm-kernel, kvmarm, linux-kernel

On Tuesday 10 Dec 2024 at 13:58:42 (+0000), Fuad Tabba wrote:
> Hi Quentin,
> 
> On Tue, 3 Dec 2024 at 10:37, Quentin Perret <qperret@google.com> wrote:
> >
> > In preparation for handling guest stage-2 mappings at EL2, introduce a
> > new pKVM hypercall allowing to share pages with non-protected guests.
> >
> > Signed-off-by: Quentin Perret <qperret@google.com>
> > ---
> >  arch/arm64/include/asm/kvm_asm.h              |  1 +
> >  arch/arm64/include/asm/kvm_host.h             |  3 +
> >  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  1 +
> >  arch/arm64/kvm/hyp/include/nvhe/memory.h      |  2 +
> >  arch/arm64/kvm/hyp/nvhe/hyp-main.c            | 34 +++++++++
> >  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 70 +++++++++++++++++++
> >  arch/arm64/kvm/hyp/nvhe/pkvm.c                |  7 ++
> >  7 files changed, 118 insertions(+)
> >
> > diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> > index 89c0fac69551..449337f5b2a3 100644
> > --- a/arch/arm64/include/asm/kvm_asm.h
> > +++ b/arch/arm64/include/asm/kvm_asm.h
> > @@ -65,6 +65,7 @@ enum __kvm_host_smccc_func {
> >         /* Hypercalls available after pKVM finalisation */
> >         __KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp,
> >         __KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
> > +       __KVM_HOST_SMCCC_FUNC___pkvm_host_share_guest,
> >         __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
> >         __KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
> >         __KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index e18e9244d17a..f75988e3515b 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -771,6 +771,9 @@ struct kvm_vcpu_arch {
> >         /* Cache some mmu pages needed inside spinlock regions */
> >         struct kvm_mmu_memory_cache mmu_page_cache;
> >
> > +       /* Pages to be donated to pkvm/EL2 if it runs out */
> 
> Runs out of what? :) I'm being facetious, it's just that the comment
> is a bit unclear.

	/* Pages to top-up the pKVM/EL2 guest pool */

Is that any better?

> > +       struct kvm_hyp_memcache pkvm_memcache;
> > +
> >         /* Virtual SError ESR to restore when HCR_EL2.VSE is set */
> >         u64 vsesr_el2;
> >
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > index 25038ac705d8..a7976e50f556 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > @@ -39,6 +39,7 @@ int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages);
> >  int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
> >  int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
> >  int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
> > +int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum kvm_pgtable_prot prot);
> >
> >  bool addr_is_memory(phys_addr_t phys);
> >  int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
> > index 08f3a0416d4c..457318215155 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
> > @@ -47,6 +47,8 @@ struct hyp_page {
> >
> >         /* Host (non-meta) state. Guarded by the host stage-2 lock. */
> >         enum pkvm_page_state host_state : 8;
> > +
> > +       u32 host_share_guest_count;
> >  };
> >
> >  extern u64 __hyp_vmemmap;
> > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > index 95d78db315b3..d659462fbf5d 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > @@ -211,6 +211,39 @@ static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
> >         cpu_reg(host_ctxt, 1) =  ret;
> >  }
> >
> > +static int pkvm_refill_memcache(struct pkvm_hyp_vcpu *hyp_vcpu)
> > +{
> > +       struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
> > +
> > +       return refill_memcache(&hyp_vcpu->vcpu.arch.pkvm_memcache,
> > +                              host_vcpu->arch.pkvm_memcache.nr_pages,
> > +                              &host_vcpu->arch.pkvm_memcache);
> > +}
> > +
> > +static void handle___pkvm_host_share_guest(struct kvm_cpu_context *host_ctxt)
> > +{
> > +       DECLARE_REG(u64, pfn, host_ctxt, 1);
> > +       DECLARE_REG(u64, gfn, host_ctxt, 2);
> > +       DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 3);
> > +       struct pkvm_hyp_vcpu *hyp_vcpu;
> > +       int ret = -EINVAL;
> > +
> > +       if (!is_protected_kvm_enabled())
> > +               goto out;
> > +
> > +       hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
> > +       if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
> > +               goto out;
> > +
> > +       ret = pkvm_refill_memcache(hyp_vcpu);
> > +       if (ret)
> > +               goto out;
> > +
> > +       ret = __pkvm_host_share_guest(pfn, gfn, hyp_vcpu, prot);
> > +out:
> > +       cpu_reg(host_ctxt, 1) =  ret;
> > +}
> > +
> >  static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt)
> >  {
> >         DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
> > @@ -420,6 +453,7 @@ static const hcall_t host_hcall[] = {
> >
> >         HANDLE_FUNC(__pkvm_host_share_hyp),
> >         HANDLE_FUNC(__pkvm_host_unshare_hyp),
> > +       HANDLE_FUNC(__pkvm_host_share_guest),
> >         HANDLE_FUNC(__kvm_adjust_pc),
> >         HANDLE_FUNC(__kvm_vcpu_run),
> >         HANDLE_FUNC(__kvm_flush_vm_context),
> > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > index 1595081c4f6b..a69d7212b64c 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > @@ -861,6 +861,27 @@ static int hyp_complete_donation(u64 addr,
> >         return pkvm_create_mappings_locked(start, end, prot);
> >  }
> >
> > +static enum pkvm_page_state guest_get_page_state(kvm_pte_t pte, u64 addr)
> > +{
> > +       if (!kvm_pte_valid(pte))
> > +               return PKVM_NOPAGE;
> > +
> > +       return pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte));
> > +}
> > +
> > +static int __guest_check_page_state_range(struct pkvm_hyp_vcpu *vcpu, u64 addr,
> > +                                         u64 size, enum pkvm_page_state state)
> > +{
> > +       struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
> > +       struct check_walk_data d = {
> > +               .desired        = state,
> > +               .get_page_state = guest_get_page_state,
> > +       };
> > +
> > +       hyp_assert_lock_held(&vm->lock);
> > +       return check_page_state_range(&vm->pgt, addr, size, &d);
> > +}
> > +
> >  static int check_share(struct pkvm_mem_share *share)
> >  {
> >         const struct pkvm_mem_transition *tx = &share->tx;
> > @@ -1343,3 +1364,52 @@ int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages)
> >
> >         return ret;
> >  }
> > +
> > +int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu,
> > +                           enum kvm_pgtable_prot prot)
> > +{
> > +       struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
> > +       u64 phys = hyp_pfn_to_phys(pfn);
> > +       u64 ipa = hyp_pfn_to_phys(gfn);
> > +       struct hyp_page *page;
> > +       int ret;
> > +
> > +       if (prot & ~KVM_PGTABLE_PROT_RWX)
> > +               return -EINVAL;
> > +
> > +       ret = range_is_allowed_memory(phys, phys + PAGE_SIZE);
> > +       if (ret)
> > +               return ret;
> > +
> > +       host_lock_component();
> > +       guest_lock_component(vm);
> > +
> > +       ret = __guest_check_page_state_range(vcpu, ipa, PAGE_SIZE, PKVM_NOPAGE);
> > +       if (ret)
> > +               goto unlock;
> > +
> > +       page = hyp_phys_to_page(phys);
> > +       switch (page->host_state) {
> > +       case PKVM_PAGE_OWNED:
> > +               WARN_ON(__host_set_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_SHARED_OWNED));
> > +               break;
> > +       case PKVM_PAGE_SHARED_OWNED:
> > +               /* Only host to np-guest multi-sharing is tolerated */
> 
> Initially I thought the comment was related to the warning below,
> which confused me.

It actually is about the warning below :-)

> Now I think what you're trying to say is that we'll
> allow the share, and the (unrelated to the comment) warning is to
> ensure that the PKVM_PAGE_SHARED_OWNED is consistent with the share
> count.

So, the only case where the host should ever attempt do use
__pkvm_host_share_guest() on a page that is already shared is for a page
already shared *with an np-guest*. The page->host_share_guest_count being
elevated is the easiest way to check that the page is indeed in that
state, hence the warning.

If for example the host was trying to share with an np-guest a page that
is currently shared with the hypervisor, that check would fail. We can
discuss whether or not we would want to allow it, but for now there is
strictly no need for it so I went with the restrictive option. We can
relax that constraint later if need be.

> I think what you should have here, which would work better with the
> comment, is something like:
> 
>                 /* Only host to np-guest multi-sharing is tolerated */
> +               if (pkvm_hyp_vcpu_is_protected(vcpu))
> +                       return -EPERM;
> 
> That would even make the comment unnecessary.

I would prefer not adding this here, handle___pkvm_host_share_guest() in
hyp-main.c already does that for us.

> 
> > +               WARN_ON(!page->host_share_guest_count);
> > +               break;
> > +       default:
> > +               ret = -EPERM;
> > +               goto unlock;
> > +       }
> > +
> > +       WARN_ON(kvm_pgtable_stage2_map(&vm->pgt, ipa, PAGE_SIZE, phys,
> > +                                      pkvm_mkstate(prot, PKVM_PAGE_SHARED_BORROWED),
> > +                                      &vcpu->vcpu.arch.pkvm_memcache, 0));
> > +       page->host_share_guest_count++;
> > +
> > +unlock:
> > +       guest_unlock_component(vm);
> > +       host_unlock_component();
> > +
> > +       return ret;
> > +}
> > diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> > index d5c23449a64c..d6c61a5e7b6e 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> > @@ -795,6 +795,13 @@ int __pkvm_teardown_vm(pkvm_handle_t handle)
> >         /* Push the metadata pages to the teardown memcache */
> >         for (idx = 0; idx < hyp_vm->nr_vcpus; ++idx) {
> >                 struct pkvm_hyp_vcpu *hyp_vcpu = hyp_vm->vcpus[idx];
> > +               struct kvm_hyp_memcache *vcpu_mc = &hyp_vcpu->vcpu.arch.pkvm_memcache;
> > +
> > +               while (vcpu_mc->nr_pages) {
> > +                       void *addr = pop_hyp_memcache(vcpu_mc, hyp_phys_to_virt);
> 
> nit: newline
> 
> Cheers,
> /fuad
> 
> 
> 
> > +                       push_hyp_memcache(mc, addr, hyp_virt_to_phys);
> > +                       unmap_donated_memory_noclear(addr, PAGE_SIZE);
> > +               }
> >
> >                 teardown_donated_memory(mc, hyp_vcpu, sizeof(*hyp_vcpu));
> >         }
> > --
> > 2.47.0.338.g60cca15819-goog
> >


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 04/18] KVM: arm64: Move host page ownership tracking to the hyp vmemmap
  2024-12-10 15:29     ` Quentin Perret
@ 2024-12-10 15:46       ` Fuad Tabba
  0 siblings, 0 replies; 50+ messages in thread
From: Fuad Tabba @ 2024-12-10 15:46 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Vincent Donnefort,
	Sebastian Ene, linux-arm-kernel, kvmarm, linux-kernel

Hi Quentin,

On Tue, 10 Dec 2024 at 15:29, Quentin Perret <qperret@google.com> wrote:
>
> Hey Fuad,
>
> On Tuesday 10 Dec 2024 at 13:02:45 (+0000), Fuad Tabba wrote:
> > Hi Quentin,
> >
> > On Tue, 3 Dec 2024 at 10:37, Quentin Perret <qperret@google.com> wrote:
> > >
> > > We currently store part of the page-tracking state in PTE software bits
> > > for the host, guests and the hypervisor. This is sub-optimal when e.g.
> > > sharing pages as this forces to break block mappings purely to support
> > > this software tracking. This causes an unnecessarily fragmented stage-2
> > > page-table for the host in particular when it shares pages with Secure,
> > > which can lead to measurable regressions. Moreover, having this state
> > > stored in the page-table forces us to do multiple costly walks on the
> > > page transition path, hence causing overhead.
> > >
> > > In order to work around these problems, move the host-side page-tracking
> > > logic from SW bits in its stage-2 PTEs to the hypervisor's vmemmap.
> > >
> > > Signed-off-by: Quentin Perret <qperret@google.com>
> > > ---
> > >  arch/arm64/kvm/hyp/include/nvhe/memory.h |  6 +-
> > >  arch/arm64/kvm/hyp/nvhe/mem_protect.c    | 94 ++++++++++++++++--------
> > >  arch/arm64/kvm/hyp/nvhe/setup.c          |  7 +-
> > >  3 files changed, 71 insertions(+), 36 deletions(-)
> > >
> > > diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
> > > index 88cb8ff9e769..08f3a0416d4c 100644
> > > --- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
> > > +++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
> > > @@ -8,7 +8,7 @@
> > >  #include <linux/types.h>
> > >
> > >  /*
> > > - * SW bits 0-1 are reserved to track the memory ownership state of each page:
> > > + * Bits 0-1 are reserved to track the memory ownership state of each page:
> > >   *   00: The page is owned exclusively by the page-table owner.
> > >   *   01: The page is owned by the page-table owner, but is shared
> > >   *       with another entity.
> >
> > Not shown in this patch, but a couple of lines below, you might want
> > to update the comment on PKVM_NOPAGE to fix the reference to "PTE's SW
> > bits":
>
> I actually think the comment is still correct -- PKVM_NOPAGE never goes
> in the software bits, with or without this patch, so I figured we could
> leave it as-is. But happy to reword if you have a good idea :)

I see, no, that's fine.

> > > /* Meta-states which aren't encoded directly in the PTE's SW bits */
> > > PKVM_NOPAGE = BIT(2),
> >
> > > @@ -44,7 +44,9 @@ static inline enum pkvm_page_state pkvm_getstate(enum kvm_pgtable_prot prot)
> > >  struct hyp_page {
> > >         u16 refcount;
> > >         u8 order;
> > > -       u8 reserved;
> > > +
> > > +       /* Host (non-meta) state. Guarded by the host stage-2 lock. */
> > > +       enum pkvm_page_state host_state : 8;
> > >  };
> > >
> > >  extern u64 __hyp_vmemmap;
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > index caba3e4bd09e..1595081c4f6b 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > @@ -201,8 +201,8 @@ static void *guest_s2_zalloc_page(void *mc)
> > >
> > >         memset(addr, 0, PAGE_SIZE);
> > >         p = hyp_virt_to_page(addr);
> > > -       memset(p, 0, sizeof(*p));
> > >         p->refcount = 1;
> > > +       p->order = 0;
> > >
> > >         return addr;
> > >  }
> > > @@ -268,6 +268,7 @@ int kvm_guest_prepare_stage2(struct pkvm_hyp_vm *vm, void *pgd)
> > >
> > >  void reclaim_guest_pages(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc)
> > >  {
> > > +       struct hyp_page *page;
> > >         void *addr;
> > >
> > >         /* Dump all pgtable pages in the hyp_pool */
> > > @@ -279,7 +280,9 @@ void reclaim_guest_pages(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc)
> > >         /* Drain the hyp_pool into the memcache */
> > >         addr = hyp_alloc_pages(&vm->pool, 0);
> > >         while (addr) {
> > > -               memset(hyp_virt_to_page(addr), 0, sizeof(struct hyp_page));
> > > +               page = hyp_virt_to_page(addr);
> > > +               page->refcount = 0;
> > > +               page->order = 0;
> > >                 push_hyp_memcache(mc, addr, hyp_virt_to_phys);
> > >                 WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(addr), 1));
> > >                 addr = hyp_alloc_pages(&vm->pool, 0);
> > > @@ -382,19 +385,25 @@ bool addr_is_memory(phys_addr_t phys)
> > >         return !!find_mem_range(phys, &range);
> > >  }
> > >
> > > -static bool addr_is_allowed_memory(phys_addr_t phys)
> > > +static bool is_in_mem_range(u64 addr, struct kvm_mem_range *range)
> > > +{
> > > +       return range->start <= addr && addr < range->end;
> > > +}
> > > +
> > > +static int range_is_allowed_memory(u64 start, u64 end)
> >
> > The name of this function "range_*is*_..." implies that it returns a
> > boolean, which other functions in this file (and patch) with similar
> > names do, but it returns an error instead. Maybe
> > check_range_allowed_memory)?
>
> Ack, I'll rename in v3.
>
> > >  {
> > >         struct memblock_region *reg;
> > >         struct kvm_mem_range range;
> > >
> > > -       reg = find_mem_range(phys, &range);
> > > +       /* Can't check the state of both MMIO and memory regions at once */
> >
> > I don't understand this comment in relation to the code. Could you
> > explain it to me please?
>
> find_mem_range() will iterate the list of memblocks to find the 'range'
> in which @start falls. That might either be in a memblock (so @addr is
> memory, and @reg != NULL) or outside of one (so @addr is mmio, and
> @reg == NULL). The check right after ensures that @end is in the same
> PA range as @start. IOW, this checks that [start, end[ doesn't overlap
> memory and MMIO, because the following logic wouldn't work for a mixed
> case like that.

I understand now. I think it might be worth elaborating a bit on the
comment to clarify that. What is confusing to me was that the comment
refers to checking state, but it's in a function that does not care
about page state, i.e., it's not immediately obvious that its primary
callers/users are __host_check_page_state_range(), and other functions
that check the state of a range.

> > > +       reg = find_mem_range(start, &range);
> > > +       if (!is_in_mem_range(end - 1, &range))
> > > +               return -EINVAL;
> > >
> > > -       return reg && !(reg->flags & MEMBLOCK_NOMAP);
> > > -}
> > > +       if (!reg || reg->flags & MEMBLOCK_NOMAP)
> > > +               return -EPERM;
> > >
> > > -static bool is_in_mem_range(u64 addr, struct kvm_mem_range *range)
> > > -{
> > > -       return range->start <= addr && addr < range->end;
> > > +       return 0;
> > >  }
> > >
> > >  static bool range_is_memory(u64 start, u64 end)
> > > @@ -454,8 +463,11 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
> > >         if (kvm_pte_valid(pte))
> > >                 return -EAGAIN;
> > >
> > > -       if (pte)
> > > +       if (pte) {
> > > +               WARN_ON(addr_is_memory(addr) &&
> > > +                       !(hyp_phys_to_page(addr)->host_state & PKVM_NOPAGE));
> >
> > nit: since the host state is now an enum, should this just be an
> > equality check rather than an &? This makes it consistent with other
> > checks of pkvm_page_state in this patch too.
>
> We don't currently have a state that is additive to PKVM_NOPAGE, so no
> objection from me.
>
> > >                 return -EPERM;
> > > +       }
> > >
> > >         do {
> > >                 u64 granule = kvm_granule_size(level);
> > > @@ -477,10 +489,29 @@ int host_stage2_idmap_locked(phys_addr_t addr, u64 size,
> > >         return host_stage2_try(__host_stage2_idmap, addr, addr + size, prot);
> > >  }
> > >
> > > +static void __host_update_page_state(phys_addr_t addr, u64 size, enum pkvm_page_state state)
> > > +{
> > > +       phys_addr_t end = addr + size;
> >
> > nit: newline
> >
> > > +       for (; addr < end; addr += PAGE_SIZE)
> > > +               hyp_phys_to_page(addr)->host_state = state;
> > > +}
> > > +
> > >  int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
> > >  {
> > > -       return host_stage2_try(kvm_pgtable_stage2_set_owner, &host_mmu.pgt,
> > > -                              addr, size, &host_s2_pool, owner_id);
> > > +       int ret;
> > > +
> > > +       ret = host_stage2_try(kvm_pgtable_stage2_set_owner, &host_mmu.pgt,
> > > +                             addr, size, &host_s2_pool, owner_id);
> > > +       if (ret || !addr_is_memory(addr))
> > > +               return ret;
> >
> > Can hyp set an owner for an address that isn't memory? Trying to
> > understand why we need to update the host stage2 pagetable but not the
> > hypervisor's vmemmap in that case.
>
> I think the answer is not currently, but we will when we'll have to e.g.
> donate IOMMU registers to EL2 and things of that nature. Note that this
> does require an extension to __host_check_page_state_range() to go query
> the page-table 'the old way' for MMIO addresses, though that isn't done
> in this series. If you think strongly that this is confusing, I'm happy
> to drop that check and we'll add it back with the IOMMU series or
> something like that.

I think it's worth a comment if you're not dropping the check..

>
> > > +
> > > +       /* Don't forget to update the vmemmap tracking for the host */
> > > +       if (owner_id == PKVM_ID_HOST)
> > > +               __host_update_page_state(addr, size, PKVM_PAGE_OWNED);
> > > +       else
> > > +               __host_update_page_state(addr, size, PKVM_NOPAGE);
> > > +
> > > +       return 0;
> > >  }
> > >
> > >  static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot prot)
> > > @@ -604,35 +635,38 @@ static int check_page_state_range(struct kvm_pgtable *pgt, u64 addr, u64 size,
> > >         return kvm_pgtable_walk(pgt, addr, size, &walker);
> > >  }
> > >
> > > -static enum pkvm_page_state host_get_page_state(kvm_pte_t pte, u64 addr)
> > > -{
> > > -       if (!addr_is_allowed_memory(addr))
> > > -               return PKVM_NOPAGE;
> > > -
> > > -       if (!kvm_pte_valid(pte) && pte)
> > > -               return PKVM_NOPAGE;
> > > -
> > > -       return pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte));
> > > -}
> > > -
> > >  static int __host_check_page_state_range(u64 addr, u64 size,
> > >                                          enum pkvm_page_state state)
> > >  {
> > > -       struct check_walk_data d = {
> > > -               .desired        = state,
> > > -               .get_page_state = host_get_page_state,
> > > -       };
> > > +       u64 end = addr + size;
> > > +       int ret;
> > > +
> > > +       ret = range_is_allowed_memory(addr, end);
> > > +       if (ret)
> > > +               return ret;
> > >
> > >         hyp_assert_lock_held(&host_mmu.lock);
> > > -       return check_page_state_range(&host_mmu.pgt, addr, size, &d);
> > > +       for (; addr < end; addr += PAGE_SIZE) {
> > > +               if (hyp_phys_to_page(addr)->host_state != state)
> > > +                       return -EPERM;
> > > +       }
> > > +
> > > +       return 0;
> > >  }
> > >
> > >  static int __host_set_page_state_range(u64 addr, u64 size,
> > >                                        enum pkvm_page_state state)
> > >  {
> > > -       enum kvm_pgtable_prot prot = pkvm_mkstate(PKVM_HOST_MEM_PROT, state);
> > > +       if (hyp_phys_to_page(addr)->host_state & PKVM_NOPAGE) {
> >
> > Same nit as above regarding checking for PKVM_NOPAGE
> >
> > Cheers,
> > /fuad
> >
> >
> > > +               int ret = host_stage2_idmap_locked(addr, size, PKVM_HOST_MEM_PROT);
> > >
> > > -       return host_stage2_idmap_locked(addr, size, prot);
> > > +               if (ret)
> > > +                       return ret;
> > > +       }
> > > +
> > > +       __host_update_page_state(addr, size, state);
> > > +
> > > +       return 0;
> > >  }
> > >
> > >  static int host_request_owned_transition(u64 *completer_addr,
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> > > index cbdd18cd3f98..7e04d1c2a03d 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> > > @@ -180,7 +180,6 @@ static void hpool_put_page(void *addr)
> > >  static int fix_host_ownership_walker(const struct kvm_pgtable_visit_ctx *ctx,
> > >                                      enum kvm_pgtable_walk_flags visit)
> > >  {
> > > -       enum kvm_pgtable_prot prot;
> > >         enum pkvm_page_state state;
> > >         phys_addr_t phys;
> > >
> > > @@ -203,16 +202,16 @@ static int fix_host_ownership_walker(const struct kvm_pgtable_visit_ctx *ctx,
> > >         case PKVM_PAGE_OWNED:
> > >                 return host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_HYP);
> > >         case PKVM_PAGE_SHARED_OWNED:
> > > -               prot = pkvm_mkstate(PKVM_HOST_MEM_PROT, PKVM_PAGE_SHARED_BORROWED);
> > > +               hyp_phys_to_page(phys)->host_state = PKVM_PAGE_SHARED_BORROWED;
> > >                 break;
> > >         case PKVM_PAGE_SHARED_BORROWED:
> > > -               prot = pkvm_mkstate(PKVM_HOST_MEM_PROT, PKVM_PAGE_SHARED_OWNED);
> > > +               hyp_phys_to_page(phys)->host_state = PKVM_PAGE_SHARED_OWNED;
> > >                 break;
> > >         default:
> > >                 return -EINVAL;
> > >         }
> > >
> > > -       return host_stage2_idmap_locked(phys, PAGE_SIZE, prot);
> > > +       return 0;
> > >  }
> > >
> > >  static int fix_hyp_pgtable_refcnt_walker(const struct kvm_pgtable_visit_ctx *ctx,
> > > --
> > > 2.47.0.338.g60cca15819-goog
> > >


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 10/18] KVM: arm64: Introduce __pkvm_host_share_guest()
  2024-12-10 15:41     ` Quentin Perret
@ 2024-12-10 15:51       ` Fuad Tabba
  2024-12-11  9:58         ` Quentin Perret
  0 siblings, 1 reply; 50+ messages in thread
From: Fuad Tabba @ 2024-12-10 15:51 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Vincent Donnefort,
	Sebastian Ene, linux-arm-kernel, kvmarm, linux-kernel

On Tue, 10 Dec 2024 at 15:41, Quentin Perret <qperret@google.com> wrote:
>
> On Tuesday 10 Dec 2024 at 13:58:42 (+0000), Fuad Tabba wrote:
> > Hi Quentin,
> >
> > On Tue, 3 Dec 2024 at 10:37, Quentin Perret <qperret@google.com> wrote:
> > >
> > > In preparation for handling guest stage-2 mappings at EL2, introduce a
> > > new pKVM hypercall allowing to share pages with non-protected guests.
> > >
> > > Signed-off-by: Quentin Perret <qperret@google.com>
> > > ---
> > >  arch/arm64/include/asm/kvm_asm.h              |  1 +
> > >  arch/arm64/include/asm/kvm_host.h             |  3 +
> > >  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  1 +
> > >  arch/arm64/kvm/hyp/include/nvhe/memory.h      |  2 +
> > >  arch/arm64/kvm/hyp/nvhe/hyp-main.c            | 34 +++++++++
> > >  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 70 +++++++++++++++++++
> > >  arch/arm64/kvm/hyp/nvhe/pkvm.c                |  7 ++
> > >  7 files changed, 118 insertions(+)
> > >
> > > diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> > > index 89c0fac69551..449337f5b2a3 100644
> > > --- a/arch/arm64/include/asm/kvm_asm.h
> > > +++ b/arch/arm64/include/asm/kvm_asm.h
> > > @@ -65,6 +65,7 @@ enum __kvm_host_smccc_func {
> > >         /* Hypercalls available after pKVM finalisation */
> > >         __KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp,
> > >         __KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
> > > +       __KVM_HOST_SMCCC_FUNC___pkvm_host_share_guest,
> > >         __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
> > >         __KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
> > >         __KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
> > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > > index e18e9244d17a..f75988e3515b 100644
> > > --- a/arch/arm64/include/asm/kvm_host.h
> > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > @@ -771,6 +771,9 @@ struct kvm_vcpu_arch {
> > >         /* Cache some mmu pages needed inside spinlock regions */
> > >         struct kvm_mmu_memory_cache mmu_page_cache;
> > >
> > > +       /* Pages to be donated to pkvm/EL2 if it runs out */
> >
> > Runs out of what? :) I'm being facetious, it's just that the comment
> > is a bit unclear.
>
>         /* Pages to top-up the pKVM/EL2 guest pool */
>
> Is that any better?
>
> > > +       struct kvm_hyp_memcache pkvm_memcache;
> > > +
> > >         /* Virtual SError ESR to restore when HCR_EL2.VSE is set */
> > >         u64 vsesr_el2;
> > >
> > > diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > > index 25038ac705d8..a7976e50f556 100644
> > > --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > > +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > > @@ -39,6 +39,7 @@ int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages);
> > >  int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
> > >  int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
> > >  int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
> > > +int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum kvm_pgtable_prot prot);
> > >
> > >  bool addr_is_memory(phys_addr_t phys);
> > >  int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
> > > diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
> > > index 08f3a0416d4c..457318215155 100644
> > > --- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
> > > +++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
> > > @@ -47,6 +47,8 @@ struct hyp_page {
> > >
> > >         /* Host (non-meta) state. Guarded by the host stage-2 lock. */
> > >         enum pkvm_page_state host_state : 8;
> > > +
> > > +       u32 host_share_guest_count;
> > >  };
> > >
> > >  extern u64 __hyp_vmemmap;
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > > index 95d78db315b3..d659462fbf5d 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > > @@ -211,6 +211,39 @@ static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
> > >         cpu_reg(host_ctxt, 1) =  ret;
> > >  }
> > >
> > > +static int pkvm_refill_memcache(struct pkvm_hyp_vcpu *hyp_vcpu)
> > > +{
> > > +       struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
> > > +
> > > +       return refill_memcache(&hyp_vcpu->vcpu.arch.pkvm_memcache,
> > > +                              host_vcpu->arch.pkvm_memcache.nr_pages,
> > > +                              &host_vcpu->arch.pkvm_memcache);
> > > +}
> > > +
> > > +static void handle___pkvm_host_share_guest(struct kvm_cpu_context *host_ctxt)
> > > +{
> > > +       DECLARE_REG(u64, pfn, host_ctxt, 1);
> > > +       DECLARE_REG(u64, gfn, host_ctxt, 2);
> > > +       DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 3);
> > > +       struct pkvm_hyp_vcpu *hyp_vcpu;
> > > +       int ret = -EINVAL;
> > > +
> > > +       if (!is_protected_kvm_enabled())
> > > +               goto out;
> > > +
> > > +       hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
> > > +       if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
> > > +               goto out;
> > > +
> > > +       ret = pkvm_refill_memcache(hyp_vcpu);
> > > +       if (ret)
> > > +               goto out;
> > > +
> > > +       ret = __pkvm_host_share_guest(pfn, gfn, hyp_vcpu, prot);
> > > +out:
> > > +       cpu_reg(host_ctxt, 1) =  ret;
> > > +}
> > > +
> > >  static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt)
> > >  {
> > >         DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
> > > @@ -420,6 +453,7 @@ static const hcall_t host_hcall[] = {
> > >
> > >         HANDLE_FUNC(__pkvm_host_share_hyp),
> > >         HANDLE_FUNC(__pkvm_host_unshare_hyp),
> > > +       HANDLE_FUNC(__pkvm_host_share_guest),
> > >         HANDLE_FUNC(__kvm_adjust_pc),
> > >         HANDLE_FUNC(__kvm_vcpu_run),
> > >         HANDLE_FUNC(__kvm_flush_vm_context),
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > index 1595081c4f6b..a69d7212b64c 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > @@ -861,6 +861,27 @@ static int hyp_complete_donation(u64 addr,
> > >         return pkvm_create_mappings_locked(start, end, prot);
> > >  }
> > >
> > > +static enum pkvm_page_state guest_get_page_state(kvm_pte_t pte, u64 addr)
> > > +{
> > > +       if (!kvm_pte_valid(pte))
> > > +               return PKVM_NOPAGE;
> > > +
> > > +       return pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte));
> > > +}
> > > +
> > > +static int __guest_check_page_state_range(struct pkvm_hyp_vcpu *vcpu, u64 addr,
> > > +                                         u64 size, enum pkvm_page_state state)
> > > +{
> > > +       struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
> > > +       struct check_walk_data d = {
> > > +               .desired        = state,
> > > +               .get_page_state = guest_get_page_state,
> > > +       };
> > > +
> > > +       hyp_assert_lock_held(&vm->lock);
> > > +       return check_page_state_range(&vm->pgt, addr, size, &d);
> > > +}
> > > +
> > >  static int check_share(struct pkvm_mem_share *share)
> > >  {
> > >         const struct pkvm_mem_transition *tx = &share->tx;
> > > @@ -1343,3 +1364,52 @@ int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages)
> > >
> > >         return ret;
> > >  }
> > > +
> > > +int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu,
> > > +                           enum kvm_pgtable_prot prot)
> > > +{
> > > +       struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
> > > +       u64 phys = hyp_pfn_to_phys(pfn);
> > > +       u64 ipa = hyp_pfn_to_phys(gfn);
> > > +       struct hyp_page *page;
> > > +       int ret;
> > > +
> > > +       if (prot & ~KVM_PGTABLE_PROT_RWX)
> > > +               return -EINVAL;
> > > +
> > > +       ret = range_is_allowed_memory(phys, phys + PAGE_SIZE);
> > > +       if (ret)
> > > +               return ret;
> > > +
> > > +       host_lock_component();
> > > +       guest_lock_component(vm);
> > > +
> > > +       ret = __guest_check_page_state_range(vcpu, ipa, PAGE_SIZE, PKVM_NOPAGE);
> > > +       if (ret)
> > > +               goto unlock;
> > > +
> > > +       page = hyp_phys_to_page(phys);
> > > +       switch (page->host_state) {
> > > +       case PKVM_PAGE_OWNED:
> > > +               WARN_ON(__host_set_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_SHARED_OWNED));
> > > +               break;
> > > +       case PKVM_PAGE_SHARED_OWNED:
> > > +               /* Only host to np-guest multi-sharing is tolerated */
> >
> > Initially I thought the comment was related to the warning below,
> > which confused me.
>
> It actually is about the warning below :-)
>
> > Now I think what you're trying to say is that we'll
> > allow the share, and the (unrelated to the comment) warning is to
> > ensure that the PKVM_PAGE_SHARED_OWNED is consistent with the share
> > count.
>
> So, the only case where the host should ever attempt do use
> __pkvm_host_share_guest() on a page that is already shared is for a page
> already shared *with an np-guest*. The page->host_share_guest_count being
> elevated is the easiest way to check that the page is indeed in that
> state, hence the warning.
>
> If for example the host was trying to share with an np-guest a page that
> is currently shared with the hypervisor, that check would fail. We can
> discuss whether or not we would want to allow it, but for now there is
> strictly no need for it so I went with the restrictive option. We can
> relax that constraint later if need be.
>
> > I think what you should have here, which would work better with the
> > comment, is something like:
> >
> >                 /* Only host to np-guest multi-sharing is tolerated */
> > +               if (pkvm_hyp_vcpu_is_protected(vcpu))
> > +                       return -EPERM;
> >
> > That would even make the comment unnecessary.
>
> I would prefer not adding this here, handle___pkvm_host_share_guest() in
> hyp-main.c already does that for us.

I understand now, and I agree that an additional check isn't
necessary. Could you clarify the comment though? It's the word "only"
that threw me off, since to me it implied that the check was enforcing
the word "only". Maybe:

>                 /* Tolerate host to np-guest multi-sharing. */


Thanks,
/fuad

> >
> > > +               WARN_ON(!page->host_share_guest_count);
> > > +               break;
> > > +       default:
> > > +               ret = -EPERM;
> > > +               goto unlock;
> > > +       }
> > > +
> > > +       WARN_ON(kvm_pgtable_stage2_map(&vm->pgt, ipa, PAGE_SIZE, phys,
> > > +                                      pkvm_mkstate(prot, PKVM_PAGE_SHARED_BORROWED),
> > > +                                      &vcpu->vcpu.arch.pkvm_memcache, 0));
> > > +       page->host_share_guest_count++;
> > > +
> > > +unlock:
> > > +       guest_unlock_component(vm);
> > > +       host_unlock_component();
> > > +
> > > +       return ret;
> > > +}
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> > > index d5c23449a64c..d6c61a5e7b6e 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> > > @@ -795,6 +795,13 @@ int __pkvm_teardown_vm(pkvm_handle_t handle)
> > >         /* Push the metadata pages to the teardown memcache */
> > >         for (idx = 0; idx < hyp_vm->nr_vcpus; ++idx) {
> > >                 struct pkvm_hyp_vcpu *hyp_vcpu = hyp_vm->vcpus[idx];
> > > +               struct kvm_hyp_memcache *vcpu_mc = &hyp_vcpu->vcpu.arch.pkvm_memcache;
> > > +
> > > +               while (vcpu_mc->nr_pages) {
> > > +                       void *addr = pop_hyp_memcache(vcpu_mc, hyp_phys_to_virt);
> >
> > nit: newline
> >
> > Cheers,
> > /fuad
> >
> >
> >
> > > +                       push_hyp_memcache(mc, addr, hyp_virt_to_phys);
> > > +                       unmap_donated_memory_noclear(addr, PAGE_SIZE);
> > > +               }
> > >
> > >                 teardown_donated_memory(mc, hyp_vcpu, sizeof(*hyp_vcpu));
> > >         }
> > > --
> > > 2.47.0.338.g60cca15819-goog
> > >


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 11/18] KVM: arm64: Introduce __pkvm_host_unshare_guest()
  2024-12-10 14:41   ` Fuad Tabba
@ 2024-12-10 15:53     ` Quentin Perret
  2024-12-10 15:57       ` Fuad Tabba
  0 siblings, 1 reply; 50+ messages in thread
From: Quentin Perret @ 2024-12-10 15:53 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Vincent Donnefort,
	Sebastian Ene, linux-arm-kernel, kvmarm, linux-kernel

On Tuesday 10 Dec 2024 at 14:41:12 (+0000), Fuad Tabba wrote:
> Hi Quentin,
> 
> On Tue, 3 Dec 2024 at 10:38, Quentin Perret <qperret@google.com> wrote:
> >
> > In preparation for letting the host unmap pages from non-protected
> > guests, introduce a new hypercall implementing the host-unshare-guest
> > transition.
> >
> > Signed-off-by: Quentin Perret <qperret@google.com>
> > ---
> >  arch/arm64/include/asm/kvm_asm.h              |  1 +
> >  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  1 +
> >  arch/arm64/kvm/hyp/include/nvhe/pkvm.h        |  5 ++
> >  arch/arm64/kvm/hyp/nvhe/hyp-main.c            | 24 +++++++
> >  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 67 +++++++++++++++++++
> >  5 files changed, 98 insertions(+)
> >
> > diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> > index 449337f5b2a3..0b6c4d325134 100644
> > --- a/arch/arm64/include/asm/kvm_asm.h
> > +++ b/arch/arm64/include/asm/kvm_asm.h
> > @@ -66,6 +66,7 @@ enum __kvm_host_smccc_func {
> >         __KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp,
> >         __KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
> >         __KVM_HOST_SMCCC_FUNC___pkvm_host_share_guest,
> > +       __KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_guest,
> >         __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
> >         __KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
> >         __KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > index a7976e50f556..e528a42ed60e 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > @@ -40,6 +40,7 @@ int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
> >  int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
> >  int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
> >  int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum kvm_pgtable_prot prot);
> > +int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
> 
> The parameters of share_guest and unshare_guest are quite different. I
> think that the unshare makes more sense, that it uses the hyp_vm as
> opposed to the hyp_vcpu. Still, I think that one of the two should
> change.

Hmm, so that is actually a bit difficult. __pkvm_host_share_guest() is
guaranteed to always be called when a vCPU is loaded, and it needs to
use the per-vCPU memcache so we can't just give it the pkvm_hyp_vm as
is.

And on the other hand, __pkvm_host_unshare_guest() can end up being
called from MMU notifier where no vCPU is loaded, so it's not clear
which vCPU it should be using. We also just don't need to access
per-vCPU data-structures on that path (the unmap call can only free
page-table pages, which are always put back into the per-guest pool
directly, not in a memcache).

> >  bool addr_is_memory(phys_addr_t phys);
> >  int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> > index be52c5b15e21..5dfc9ece9aa5 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> > @@ -64,6 +64,11 @@ static inline bool pkvm_hyp_vcpu_is_protected(struct pkvm_hyp_vcpu *hyp_vcpu)
> >         return vcpu_is_protected(&hyp_vcpu->vcpu);
> >  }
> >
> > +static inline bool pkvm_hyp_vm_is_protected(struct pkvm_hyp_vm *hyp_vm)
> > +{
> > +       return kvm_vm_is_protected(&hyp_vm->kvm);
> > +}
> > +
> >  void pkvm_hyp_vm_table_init(void *tbl);
> >
> >  int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
> > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > index d659462fbf5d..04a9053ae1d5 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > @@ -244,6 +244,29 @@ static void handle___pkvm_host_share_guest(struct kvm_cpu_context *host_ctxt)
> >         cpu_reg(host_ctxt, 1) =  ret;
> >  }
> >
> > +static void handle___pkvm_host_unshare_guest(struct kvm_cpu_context *host_ctxt)
> > +{
> > +       DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
> > +       DECLARE_REG(u64, gfn, host_ctxt, 2);
> > +       struct pkvm_hyp_vm *hyp_vm;
> > +       int ret = -EINVAL;
> > +
> > +       if (!is_protected_kvm_enabled())
> > +               goto out;
> > +
> > +       hyp_vm = get_pkvm_hyp_vm(handle);
> > +       if (!hyp_vm)
> > +               goto out;
> > +       if (pkvm_hyp_vm_is_protected(hyp_vm))
> > +               goto put_hyp_vm;
> 
> bikeshedding: is -EINVAL the best return value, or might -EPERM be
> better if the VM is protected?

-EINVAL makes the code marginally simpler, especially given that we have
this pattern all across hyp-main.c, so I have a minor personal
preference for keeping it as-is, but no strong opinion really. This
really shouldn't ever hit at run-time, modulo major bugs or a malicious
host, so probably not a huge deal if EINVAL isn't particularly accurate.

> > +
> > +       ret = __pkvm_host_unshare_guest(gfn, hyp_vm);
> > +put_hyp_vm:
> > +       put_pkvm_hyp_vm(hyp_vm);
> > +out:
> > +       cpu_reg(host_ctxt, 1) =  ret;
> > +}
> > +
> >  static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt)
> >  {
> >         DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
> > @@ -454,6 +477,7 @@ static const hcall_t host_hcall[] = {
> >         HANDLE_FUNC(__pkvm_host_share_hyp),
> >         HANDLE_FUNC(__pkvm_host_unshare_hyp),
> >         HANDLE_FUNC(__pkvm_host_share_guest),
> > +       HANDLE_FUNC(__pkvm_host_unshare_guest),
> >         HANDLE_FUNC(__kvm_adjust_pc),
> >         HANDLE_FUNC(__kvm_vcpu_run),
> >         HANDLE_FUNC(__kvm_flush_vm_context),
> > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > index a69d7212b64c..aa27a3e42e5e 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > @@ -1413,3 +1413,70 @@ int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu,
> >
> >         return ret;
> >  }
> > +
> > +static int __check_host_unshare_guest(struct pkvm_hyp_vm *vm, u64 *__phys, u64 ipa)
> 
> nit: sometimes (in this and other patches) you use vm to refer to
> pkvm_hyp_vm, and other times you use hyp_vm. Makes grepping/searching
> a bit more tricky.

Ack, I'll do a pass on the series to improve the consistency.

> > +{
> > +       enum pkvm_page_state state;
> > +       struct hyp_page *page;
> > +       kvm_pte_t pte;
> > +       u64 phys;
> > +       s8 level;
> > +       int ret;
> > +
> > +       ret = kvm_pgtable_get_leaf(&vm->pgt, ipa, &pte, &level);
> > +       if (ret)
> > +               return ret;
> > +       if (level != KVM_PGTABLE_LAST_LEVEL)
> > +               return -E2BIG;
> > +       if (!kvm_pte_valid(pte))
> > +               return -ENOENT;
> > +
> > +       state = guest_get_page_state(pte, ipa);
> > +       if (state != PKVM_PAGE_SHARED_BORROWED)
> > +               return -EPERM;
> > +
> > +       phys = kvm_pte_to_phys(pte);
> > +       ret = range_is_allowed_memory(phys, phys + PAGE_SIZE);
> > +       if (WARN_ON(ret))
> > +               return ret;
> > +
> > +       page = hyp_phys_to_page(phys);
> > +       if (page->host_state != PKVM_PAGE_SHARED_OWNED)
> > +               return -EPERM;
> > +       if (WARN_ON(!page->host_share_guest_count))
> > +               return -EINVAL;
> > +
> > +       *__phys = phys;
> > +
> > +       return 0;
> > +}
> > +
> > +int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm)
> > +{
> > +       u64 ipa = hyp_pfn_to_phys(gfn);
> > +       struct hyp_page *page;
> > +       u64 phys;
> > +       int ret;
> > +
> > +       host_lock_component();
> > +       guest_lock_component(hyp_vm);
> > +
> > +       ret = __check_host_unshare_guest(hyp_vm, &phys, ipa);
> > +       if (ret)
> > +               goto unlock;
> > +
> > +       ret = kvm_pgtable_stage2_unmap(&hyp_vm->pgt, ipa, PAGE_SIZE);
> > +       if (ret)
> > +               goto unlock;
> > +
> > +       page = hyp_phys_to_page(phys);
> > +       page->host_share_guest_count--;
> > +       if (!page->host_share_guest_count)
> > +               WARN_ON(__host_set_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_OWNED));
> > +
> > +unlock:
> > +       guest_unlock_component(hyp_vm);
> > +       host_unlock_component();
> > +
> > +       return ret;
> > +}
> > --
> > 2.47.0.338.g60cca15819-goog
> >


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 11/18] KVM: arm64: Introduce __pkvm_host_unshare_guest()
  2024-12-10 15:53     ` Quentin Perret
@ 2024-12-10 15:57       ` Fuad Tabba
  0 siblings, 0 replies; 50+ messages in thread
From: Fuad Tabba @ 2024-12-10 15:57 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Vincent Donnefort,
	Sebastian Ene, linux-arm-kernel, kvmarm, linux-kernel

Hi Quentin,

On Tue, 10 Dec 2024 at 15:53, Quentin Perret <qperret@google.com> wrote:
>
> On Tuesday 10 Dec 2024 at 14:41:12 (+0000), Fuad Tabba wrote:
> > Hi Quentin,
> >
> > On Tue, 3 Dec 2024 at 10:38, Quentin Perret <qperret@google.com> wrote:
> > >
> > > In preparation for letting the host unmap pages from non-protected
> > > guests, introduce a new hypercall implementing the host-unshare-guest
> > > transition.
> > >
> > > Signed-off-by: Quentin Perret <qperret@google.com>
> > > ---
> > >  arch/arm64/include/asm/kvm_asm.h              |  1 +
> > >  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  1 +
> > >  arch/arm64/kvm/hyp/include/nvhe/pkvm.h        |  5 ++
> > >  arch/arm64/kvm/hyp/nvhe/hyp-main.c            | 24 +++++++
> > >  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 67 +++++++++++++++++++
> > >  5 files changed, 98 insertions(+)
> > >
> > > diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> > > index 449337f5b2a3..0b6c4d325134 100644
> > > --- a/arch/arm64/include/asm/kvm_asm.h
> > > +++ b/arch/arm64/include/asm/kvm_asm.h
> > > @@ -66,6 +66,7 @@ enum __kvm_host_smccc_func {
> > >         __KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp,
> > >         __KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
> > >         __KVM_HOST_SMCCC_FUNC___pkvm_host_share_guest,
> > > +       __KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_guest,
> > >         __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
> > >         __KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
> > >         __KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
> > > diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > > index a7976e50f556..e528a42ed60e 100644
> > > --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > > +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > > @@ -40,6 +40,7 @@ int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
> > >  int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
> > >  int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
> > >  int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum kvm_pgtable_prot prot);
> > > +int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
> >
> > The parameters of share_guest and unshare_guest are quite different. I
> > think that the unshare makes more sense, that it uses the hyp_vm as
> > opposed to the hyp_vcpu. Still, I think that one of the two should
> > change.
>
> Hmm, so that is actually a bit difficult. __pkvm_host_share_guest() is
> guaranteed to always be called when a vCPU is loaded, and it needs to
> use the per-vCPU memcache so we can't just give it the pkvm_hyp_vm as
> is.
>
> And on the other hand, __pkvm_host_unshare_guest() can end up being
> called from MMU notifier where no vCPU is loaded, so it's not clear
> which vCPU it should be using. We also just don't need to access
> per-vCPU data-structures on that path (the unmap call can only free
> page-table pages, which are always put back into the per-guest pool
> directly, not in a memcache).

I understand. That makes sense.

> > >  bool addr_is_memory(phys_addr_t phys);
> > >  int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
> > > diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> > > index be52c5b15e21..5dfc9ece9aa5 100644
> > > --- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> > > +++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> > > @@ -64,6 +64,11 @@ static inline bool pkvm_hyp_vcpu_is_protected(struct pkvm_hyp_vcpu *hyp_vcpu)
> > >         return vcpu_is_protected(&hyp_vcpu->vcpu);
> > >  }
> > >
> > > +static inline bool pkvm_hyp_vm_is_protected(struct pkvm_hyp_vm *hyp_vm)
> > > +{
> > > +       return kvm_vm_is_protected(&hyp_vm->kvm);
> > > +}
> > > +
> > >  void pkvm_hyp_vm_table_init(void *tbl);
> > >
> > >  int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > > index d659462fbf5d..04a9053ae1d5 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > > @@ -244,6 +244,29 @@ static void handle___pkvm_host_share_guest(struct kvm_cpu_context *host_ctxt)
> > >         cpu_reg(host_ctxt, 1) =  ret;
> > >  }
> > >
> > > +static void handle___pkvm_host_unshare_guest(struct kvm_cpu_context *host_ctxt)
> > > +{
> > > +       DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
> > > +       DECLARE_REG(u64, gfn, host_ctxt, 2);
> > > +       struct pkvm_hyp_vm *hyp_vm;
> > > +       int ret = -EINVAL;
> > > +
> > > +       if (!is_protected_kvm_enabled())
> > > +               goto out;
> > > +
> > > +       hyp_vm = get_pkvm_hyp_vm(handle);
> > > +       if (!hyp_vm)
> > > +               goto out;
> > > +       if (pkvm_hyp_vm_is_protected(hyp_vm))
> > > +               goto put_hyp_vm;
> >
> > bikeshedding: is -EINVAL the best return value, or might -EPERM be
> > better if the VM is protected?
>
> -EINVAL makes the code marginally simpler, especially given that we have
> this pattern all across hyp-main.c, so I have a minor personal
> preference for keeping it as-is, but no strong opinion really. This
> really shouldn't ever hit at run-time, modulo major bugs or a malicious
> host, so probably not a huge deal if EINVAL isn't particularly accurate.

That's fine.

> > > +
> > > +       ret = __pkvm_host_unshare_guest(gfn, hyp_vm);
> > > +put_hyp_vm:
> > > +       put_pkvm_hyp_vm(hyp_vm);
> > > +out:
> > > +       cpu_reg(host_ctxt, 1) =  ret;
> > > +}
> > > +
> > >  static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt)
> > >  {
> > >         DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
> > > @@ -454,6 +477,7 @@ static const hcall_t host_hcall[] = {
> > >         HANDLE_FUNC(__pkvm_host_share_hyp),
> > >         HANDLE_FUNC(__pkvm_host_unshare_hyp),
> > >         HANDLE_FUNC(__pkvm_host_share_guest),
> > > +       HANDLE_FUNC(__pkvm_host_unshare_guest),
> > >         HANDLE_FUNC(__kvm_adjust_pc),
> > >         HANDLE_FUNC(__kvm_vcpu_run),
> > >         HANDLE_FUNC(__kvm_flush_vm_context),
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > index a69d7212b64c..aa27a3e42e5e 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > @@ -1413,3 +1413,70 @@ int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu,
> > >
> > >         return ret;
> > >  }
> > > +
> > > +static int __check_host_unshare_guest(struct pkvm_hyp_vm *vm, u64 *__phys, u64 ipa)
> >
> > nit: sometimes (in this and other patches) you use vm to refer to
> > pkvm_hyp_vm, and other times you use hyp_vm. Makes grepping/searching
> > a bit more tricky.
>
> Ack, I'll do a pass on the series to improve the consistency.

Thanks!
/fuad

> > > +{
> > > +       enum pkvm_page_state state;
> > > +       struct hyp_page *page;
> > > +       kvm_pte_t pte;
> > > +       u64 phys;
> > > +       s8 level;
> > > +       int ret;
> > > +
> > > +       ret = kvm_pgtable_get_leaf(&vm->pgt, ipa, &pte, &level);
> > > +       if (ret)
> > > +               return ret;
> > > +       if (level != KVM_PGTABLE_LAST_LEVEL)
> > > +               return -E2BIG;
> > > +       if (!kvm_pte_valid(pte))
> > > +               return -ENOENT;
> > > +
> > > +       state = guest_get_page_state(pte, ipa);
> > > +       if (state != PKVM_PAGE_SHARED_BORROWED)
> > > +               return -EPERM;
> > > +
> > > +       phys = kvm_pte_to_phys(pte);
> > > +       ret = range_is_allowed_memory(phys, phys + PAGE_SIZE);
> > > +       if (WARN_ON(ret))
> > > +               return ret;
> > > +
> > > +       page = hyp_phys_to_page(phys);
> > > +       if (page->host_state != PKVM_PAGE_SHARED_OWNED)
> > > +               return -EPERM;
> > > +       if (WARN_ON(!page->host_share_guest_count))
> > > +               return -EINVAL;
> > > +
> > > +       *__phys = phys;
> > > +
> > > +       return 0;
> > > +}
> > > +
> > > +int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm)
> > > +{
> > > +       u64 ipa = hyp_pfn_to_phys(gfn);
> > > +       struct hyp_page *page;
> > > +       u64 phys;
> > > +       int ret;
> > > +
> > > +       host_lock_component();
> > > +       guest_lock_component(hyp_vm);
> > > +
> > > +       ret = __check_host_unshare_guest(hyp_vm, &phys, ipa);
> > > +       if (ret)
> > > +               goto unlock;
> > > +
> > > +       ret = kvm_pgtable_stage2_unmap(&hyp_vm->pgt, ipa, PAGE_SIZE);
> > > +       if (ret)
> > > +               goto unlock;
> > > +
> > > +       page = hyp_phys_to_page(phys);
> > > +       page->host_share_guest_count--;
> > > +       if (!page->host_share_guest_count)
> > > +               WARN_ON(__host_set_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_OWNED));
> > > +
> > > +unlock:
> > > +       guest_unlock_component(hyp_vm);
> > > +       host_unlock_component();
> > > +
> > > +       return ret;
> > > +}
> > > --
> > > 2.47.0.338.g60cca15819-goog
> > >


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 13/18] KVM: arm64: Introduce __pkvm_host_wrprotect_guest()
  2024-12-10 15:06   ` Fuad Tabba
@ 2024-12-10 19:38     ` Quentin Perret
  0 siblings, 0 replies; 50+ messages in thread
From: Quentin Perret @ 2024-12-10 19:38 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Vincent Donnefort,
	Sebastian Ene, linux-arm-kernel, kvmarm, linux-kernel

On Tuesday 10 Dec 2024 at 15:06:53 (+0000), Fuad Tabba wrote:
> > +static void handle___pkvm_host_wrprotect_guest(struct kvm_cpu_context *host_ctxt)
> > +{
> > +       DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
> > +       DECLARE_REG(u64, gfn, host_ctxt, 2);
> > +       struct pkvm_hyp_vm *hyp_vm;
> > +       int ret = -EINVAL;
> > +
> > +       if (!is_protected_kvm_enabled())
> > +               goto out;
> > +
> > +       hyp_vm = get_pkvm_hyp_vm(handle);
> > +       if (!hyp_vm)
> > +               goto out;
> > +       if (pkvm_hyp_vm_is_protected(hyp_vm))
> > +               goto put_hyp_vm;
> 
> These checks are (unsurprisingly) the same for all these functions.
> Does it make sense to have a helper do these checks?

Yup, that makes sense and should simplify the error handling on all the
call sites. I'll probably call that get_np_pkvm_hyp_vm() or something
along those lines and shove in pkvm.c in v3.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 14/18] KVM: arm64: Introduce __pkvm_host_test_clear_young_guest()
  2024-12-10 15:11   ` Fuad Tabba
@ 2024-12-10 19:39     ` Quentin Perret
  0 siblings, 0 replies; 50+ messages in thread
From: Quentin Perret @ 2024-12-10 19:39 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Vincent Donnefort,
	Sebastian Ene, linux-arm-kernel, kvmarm, linux-kernel

On Tuesday 10 Dec 2024 at 15:11:53 (+0000), Fuad Tabba wrote:
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > index 8658b5932473..554ce31882e6 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > @@ -43,6 +43,7 @@ int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum k
> >  int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
> >  int __pkvm_host_relax_guest_perms(u64 gfn, enum kvm_pgtable_prot prot, struct pkvm_hyp_vcpu *vcpu);
> >  int __pkvm_host_wrprotect_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
> > +int __pkvm_host_test_clear_young_guest(u64 gfn, bool mkold, struct pkvm_hyp_vm *vm);
> 
> While I'm piling on the function names/parameters, some functions have
> _guest as a postfix at the end (e.g., this one), others have it in the
> middle (__pkvm_host_relax_guest_perms). I guess
> __pkvm_host_relax_guest_perms is the odd one out. Could you rename it?

Right, 'relax_guest_perms' felt more natural to me, but consistency
should take precedence, so I'll rename :-)


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 15/18] KVM: arm64: Introduce __pkvm_host_mkyoung_guest()
  2024-12-10 15:14   ` Fuad Tabba
@ 2024-12-10 19:46     ` Quentin Perret
  2024-12-11 10:11       ` Fuad Tabba
  0 siblings, 1 reply; 50+ messages in thread
From: Quentin Perret @ 2024-12-10 19:46 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Vincent Donnefort,
	Sebastian Ene, linux-arm-kernel, kvmarm, linux-kernel

On Tuesday 10 Dec 2024 at 15:14:03 (+0000), Fuad Tabba wrote:
> > +int __pkvm_host_mkyoung_guest(u64 gfn, struct pkvm_hyp_vcpu *vcpu)
> > +{
> > +       struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
> > +       u64 ipa = hyp_pfn_to_phys(gfn);
> > +       u64 phys;
> > +       int ret;
> > +
> > +       host_lock_component();
> > +       guest_lock_component(vm);
> > +
> > +       ret = __check_host_unshare_guest(vm, &phys, ipa);
> 
> While I'm bikeshedding some more, does the name
> __check_host_unshare_guest() make sense? Should it be something like
> __check_host_changeperm_guest(), or something along those lines? (feel
> free to ignore this :) )

I understand the comment, but not a huge fan of 'changeperm' as that
sounds like we're only allowing permission changes while we use this
all over the place. Maybe __check_host_is_shared_guest()? Naming is
hard, so happy to take suggestions :-)


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 12/18] KVM: arm64: Introduce __pkvm_host_relax_guest_perms()
  2024-12-10 14:56   ` Fuad Tabba
@ 2024-12-11  8:57     ` Quentin Perret
  0 siblings, 0 replies; 50+ messages in thread
From: Quentin Perret @ 2024-12-11  8:57 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Vincent Donnefort,
	Sebastian Ene, linux-arm-kernel, kvmarm, linux-kernel

On Tuesday 10 Dec 2024 at 14:56:05 (+0000), Fuad Tabba wrote:
> Hi Quentin,
> 
> On Tue, 3 Dec 2024 at 10:38, Quentin Perret <qperret@google.com> wrote:
> >
> > Introduce a new hypercall allowing the host to relax the stage-2
> > permissions of mappings in a non-protected guest page-table. It will be
> > used later once we start allowing RO memslots and dirty logging.
> >
> > Signed-off-by: Quentin Perret <qperret@google.com>
> > ---
> >  arch/arm64/include/asm/kvm_asm.h              |  1 +
> >  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  1 +
> >  arch/arm64/kvm/hyp/nvhe/hyp-main.c            | 20 ++++++++++++++++
> >  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 23 +++++++++++++++++++
> >  4 files changed, 45 insertions(+)
> >
> > diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> > index 0b6c4d325134..5d51933e44fb 100644
> > --- a/arch/arm64/include/asm/kvm_asm.h
> > +++ b/arch/arm64/include/asm/kvm_asm.h
> > @@ -67,6 +67,7 @@ enum __kvm_host_smccc_func {
> >         __KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
> >         __KVM_HOST_SMCCC_FUNC___pkvm_host_share_guest,
> >         __KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_guest,
> > +       __KVM_HOST_SMCCC_FUNC___pkvm_host_relax_guest_perms,
> >         __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
> >         __KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
> >         __KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > index e528a42ed60e..db0dd83c2457 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > @@ -41,6 +41,7 @@ int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
> >  int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
> >  int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum kvm_pgtable_prot prot);
> >  int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
> > +int __pkvm_host_relax_guest_perms(u64 gfn, enum kvm_pgtable_prot prot, struct pkvm_hyp_vcpu *vcpu);
> 
> The parameters are the same as __pkvm_host_share_guest, but in a
> different order. I looked ahead at later patches in the series, and similar
> issues regarding parameter type and ordering, so I won't mention it
> for the later patches.

Ack to this and the other comment below, thanks for the review!

> 
> >  bool addr_is_memory(phys_addr_t phys);
> >  int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
> > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > index 04a9053ae1d5..60dd56bbd743 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > @@ -267,6 +267,25 @@ static void handle___pkvm_host_unshare_guest(struct kvm_cpu_context *host_ctxt)
> >         cpu_reg(host_ctxt, 1) =  ret;
> >  }
> >
> > +static void handle___pkvm_host_relax_guest_perms(struct kvm_cpu_context *host_ctxt)
> > +{
> > +       DECLARE_REG(u64, gfn, host_ctxt, 1);
> > +       DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 2);
> > +       struct pkvm_hyp_vcpu *hyp_vcpu;
> > +       int ret = -EINVAL;
> > +
> > +       if (!is_protected_kvm_enabled())
> > +               goto out;
> > +
> > +       hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
> > +       if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
> > +               goto out;
> > +
> > +       ret = __pkvm_host_relax_guest_perms(gfn, prot, hyp_vcpu);
> > +out:
> > +       cpu_reg(host_ctxt, 1) = ret;
> > +}
> > +
> >  static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt)
> >  {
> >         DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
> > @@ -478,6 +497,7 @@ static const hcall_t host_hcall[] = {
> >         HANDLE_FUNC(__pkvm_host_unshare_hyp),
> >         HANDLE_FUNC(__pkvm_host_share_guest),
> >         HANDLE_FUNC(__pkvm_host_unshare_guest),
> > +       HANDLE_FUNC(__pkvm_host_relax_guest_perms),
> >         HANDLE_FUNC(__kvm_adjust_pc),
> >         HANDLE_FUNC(__kvm_vcpu_run),
> >         HANDLE_FUNC(__kvm_flush_vm_context),
> > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > index aa27a3e42e5e..d4b28e93e790 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > @@ -1480,3 +1480,26 @@ int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm)
> >
> >         return ret;
> >  }
> > +
> > +int __pkvm_host_relax_guest_perms(u64 gfn, enum kvm_pgtable_prot prot, struct pkvm_hyp_vcpu *vcpu)
> > +{
> > +       struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
> > +       u64 ipa = hyp_pfn_to_phys(gfn);
> > +       u64 phys;
> > +       int ret;
> > +
> > +       if ((prot & KVM_PGTABLE_PROT_RWX) != prot)
> > +               return -EPERM;
> 
> Why not
> 
> +       if (prot & ~KVM_PGTABLE_PROT_RWX)
> 
> Simpler and consistent with similar checks in the file (e.g.,
> __pkvm_host_share_guest)
> 
> Cheers,
> /fuad
> 
> 
> > +
> > +       host_lock_component();
> > +       guest_lock_component(vm);
> > +
> > +       ret = __check_host_unshare_guest(vm, &phys, ipa);
> > +       if (!ret)
> > +               ret = kvm_pgtable_stage2_relax_perms(&vm->pgt, ipa, prot, 0);
> > +
> > +       guest_unlock_component(vm);
> > +       host_unlock_component();
> > +
> > +       return ret;
> > +}
> > --
> > 2.47.0.338.g60cca15819-goog
> >


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 10/18] KVM: arm64: Introduce __pkvm_host_share_guest()
  2024-12-10 15:51       ` Fuad Tabba
@ 2024-12-11  9:58         ` Quentin Perret
  2024-12-11 10:07           ` Fuad Tabba
  0 siblings, 1 reply; 50+ messages in thread
From: Quentin Perret @ 2024-12-11  9:58 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Vincent Donnefort,
	Sebastian Ene, linux-arm-kernel, kvmarm, linux-kernel

On Tuesday 10 Dec 2024 at 15:51:01 (+0000), Fuad Tabba wrote:
> On Tue, 10 Dec 2024 at 15:41, Quentin Perret <qperret@google.com> wrote:
> > > Initially I thought the comment was related to the warning below,
> > > which confused me.
> >
> > It actually is about the warning below :-)
> >
> > > Now I think what you're trying to say is that we'll
> > > allow the share, and the (unrelated to the comment) warning is to
> > > ensure that the PKVM_PAGE_SHARED_OWNED is consistent with the share
> > > count.
> >
> > So, the only case where the host should ever attempt do use
> > __pkvm_host_share_guest() on a page that is already shared is for a page
> > already shared *with an np-guest*. The page->host_share_guest_count being
> > elevated is the easiest way to check that the page is indeed in that
> > state, hence the warning.
> >
> > If for example the host was trying to share with an np-guest a page that
> > is currently shared with the hypervisor, that check would fail. We can
> > discuss whether or not we would want to allow it, but for now there is
> > strictly no need for it so I went with the restrictive option. We can
> > relax that constraint later if need be.
> >
> > > I think what you should have here, which would work better with the
> > > comment, is something like:
> > >
> > >                 /* Only host to np-guest multi-sharing is tolerated */
> > > +               if (pkvm_hyp_vcpu_is_protected(vcpu))
> > > +                       return -EPERM;
> > >
> > > That would even make the comment unnecessary.
> >
> > I would prefer not adding this here, handle___pkvm_host_share_guest() in
> > hyp-main.c already does that for us.
> 
> I understand now, and I agree that an additional check isn't
> necessary. Could you clarify the comment though? It's the word "only"
> that threw me off, since to me it implied that the check was enforcing
> the word "only". Maybe:
> 
> >                 /* Tolerate host to np-guest multi-sharing. */

I guess 'only' is somewhat important, it is the _only_ type of
multi-sharing that we allow and the check enforces precisely that. The
WARN_ON() will be triggered for any other type of multi-sharing, so we
are really checking that _only_ np-guest multi-sharing goes through.

Perhaps the confusing part is that the code as-is relies on WARN_ON()
being fatal for the enforcement. Would it help if I changed the 'break'
statement right after to 'fallthrough' so we proceed to return -EPERM?
In practice we won't return anything as the hypervisor will panic, but
I presume it is better from a logic perspective.

Cheers,
Quentin


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 16/18] KVM: arm64: Introduce __pkvm_tlb_flush_vmid()
  2024-12-10 15:23   ` Fuad Tabba
@ 2024-12-11 10:03     ` Quentin Perret
  2024-12-11 10:21       ` Fuad Tabba
  0 siblings, 1 reply; 50+ messages in thread
From: Quentin Perret @ 2024-12-11 10:03 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Vincent Donnefort,
	Sebastian Ene, linux-arm-kernel, kvmarm, linux-kernel

On Tuesday 10 Dec 2024 at 15:23:02 (+0000), Fuad Tabba wrote:
> Hi Quentin,
> 
> On Tue, 3 Dec 2024 at 10:38, Quentin Perret <qperret@google.com> wrote:
> >
> > Introduce a new hypercall to flush the TLBs of non-protected guests. The
> > host kernel will be responsible for issuing this hypercall after changing
> > stage-2 permissions using the __pkvm_host_relax_guest_perms() or
> > __pkvm_host_wrprotect_guest() paths. This is left under the host's
> > responsibility for performance reasons.
> >
> > Note however that the TLB maintenance for all *unmap* operations still
> > remains entirely under the hypervisor's responsibility for security
> > reasons -- an unmapped page may be donated to another entity, so a stale
> > TLB entry could be used to leak private data.
> >
> > Signed-off-by: Quentin Perret <qperret@google.com>
> > ---
> >  arch/arm64/include/asm/kvm_asm.h   |  1 +
> >  arch/arm64/kvm/hyp/nvhe/hyp-main.c | 17 +++++++++++++++++
> >  2 files changed, 18 insertions(+)
> >
> > diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> > index 6178e12a0dbc..df6237d0459c 100644
> > --- a/arch/arm64/include/asm/kvm_asm.h
> > +++ b/arch/arm64/include/asm/kvm_asm.h
> > @@ -87,6 +87,7 @@ enum __kvm_host_smccc_func {
> >         __KVM_HOST_SMCCC_FUNC___pkvm_teardown_vm,
> >         __KVM_HOST_SMCCC_FUNC___pkvm_vcpu_load,
> >         __KVM_HOST_SMCCC_FUNC___pkvm_vcpu_put,
> > +       __KVM_HOST_SMCCC_FUNC___pkvm_tlb_flush_vmid,
> >  };
> >
> >  #define DECLARE_KVM_VHE_SYM(sym)       extern char sym[]
> > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > index de0012a75827..219d7fb850ec 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > @@ -398,6 +398,22 @@ static void handle___kvm_tlb_flush_vmid(struct kvm_cpu_context *host_ctxt)
> >         __kvm_tlb_flush_vmid(kern_hyp_va(mmu));
> >  }
> >
> > +static void handle___pkvm_tlb_flush_vmid(struct kvm_cpu_context *host_ctxt)
> > +{
> > +       DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
> > +       struct pkvm_hyp_vm *hyp_vm;
> > +
> > +       if (!is_protected_kvm_enabled())
> > +               return;
> > +
> > +       hyp_vm = get_pkvm_hyp_vm(handle);
> > +       if (!hyp_vm)
> > +               return;
> > +
> > +       __kvm_tlb_flush_vmid(&hyp_vm->kvm.arch.mmu);
> > +       put_pkvm_hyp_vm(hyp_vm);
> > +}
> 
> Since this is practically the same as kvm_tlb_flush_vmid(), does it
> make sense to modify that instead (handle___kvm_tlb_flush_vmid()) to
> do the right thing depending on whether pkvm is enabled? Thinking as
> well for the future in case we want to support the rest of the
> kvm_tlb_flush_vmid_*().

I considered it, but the two implementations want different arguments --
pkvm wants the handle while standard KVM uses the kvm struct address
directly. I had an implementation at some point that multiplexed the
implementations on a single HVC (we'd interpret the arguments
differently depending on pKVM being enabled or not) but that felt more
error prone than simply having two HVCs.

Happy to reconsider if we can find a good way to make it work though.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 10/18] KVM: arm64: Introduce __pkvm_host_share_guest()
  2024-12-11  9:58         ` Quentin Perret
@ 2024-12-11 10:07           ` Fuad Tabba
  2024-12-11 10:14             ` Quentin Perret
  0 siblings, 1 reply; 50+ messages in thread
From: Fuad Tabba @ 2024-12-11 10:07 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Vincent Donnefort,
	Sebastian Ene, linux-arm-kernel, kvmarm, linux-kernel

On Wed, 11 Dec 2024 at 09:58, Quentin Perret <qperret@google.com> wrote:
>
> On Tuesday 10 Dec 2024 at 15:51:01 (+0000), Fuad Tabba wrote:
> > On Tue, 10 Dec 2024 at 15:41, Quentin Perret <qperret@google.com> wrote:
> > > > Initially I thought the comment was related to the warning below,
> > > > which confused me.
> > >
> > > It actually is about the warning below :-)
> > >
> > > > Now I think what you're trying to say is that we'll
> > > > allow the share, and the (unrelated to the comment) warning is to
> > > > ensure that the PKVM_PAGE_SHARED_OWNED is consistent with the share
> > > > count.
> > >
> > > So, the only case where the host should ever attempt do use
> > > __pkvm_host_share_guest() on a page that is already shared is for a page
> > > already shared *with an np-guest*. The page->host_share_guest_count being
> > > elevated is the easiest way to check that the page is indeed in that
> > > state, hence the warning.
> > >
> > > If for example the host was trying to share with an np-guest a page that
> > > is currently shared with the hypervisor, that check would fail. We can
> > > discuss whether or not we would want to allow it, but for now there is
> > > strictly no need for it so I went with the restrictive option. We can
> > > relax that constraint later if need be.
> > >
> > > > I think what you should have here, which would work better with the
> > > > comment, is something like:
> > > >
> > > >                 /* Only host to np-guest multi-sharing is tolerated */
> > > > +               if (pkvm_hyp_vcpu_is_protected(vcpu))
> > > > +                       return -EPERM;
> > > >
> > > > That would even make the comment unnecessary.
> > >
> > > I would prefer not adding this here, handle___pkvm_host_share_guest() in
> > > hyp-main.c already does that for us.
> >
> > I understand now, and I agree that an additional check isn't
> > necessary. Could you clarify the comment though? It's the word "only"
> > that threw me off, since to me it implied that the check was enforcing
> > the word "only". Maybe:
> >
> > >                 /* Tolerate host to np-guest multi-sharing. */
>
> I guess 'only' is somewhat important, it is the _only_ type of
> multi-sharing that we allow and the check enforces precisely that. The
> WARN_ON() will be triggered for any other type of multi-sharing, so we
> are really checking that _only_ np-guest multi-sharing goes through.
>
> Perhaps the confusing part is that the code as-is relies on WARN_ON()
> being fatal for the enforcement. Would it help if I changed the 'break'
> statement right after to 'fallthrough' so we proceed to return -EPERM?
> In practice we won't return anything as the hypervisor will panic, but
> I presume it is better from a logic perspective.

It would, but then we wouldn't be tolerating np-guest multisharing,
but like you said, it's not like we're tolerating it now anyway.

I wonder if it would be better simply not to allow multisharing at all for now.

Cheers,
/fuad


> Cheers,
> Quentin


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 15/18] KVM: arm64: Introduce __pkvm_host_mkyoung_guest()
  2024-12-10 19:46     ` Quentin Perret
@ 2024-12-11 10:11       ` Fuad Tabba
  2024-12-11 10:18         ` Quentin Perret
  0 siblings, 1 reply; 50+ messages in thread
From: Fuad Tabba @ 2024-12-11 10:11 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Vincent Donnefort,
	Sebastian Ene, linux-arm-kernel, kvmarm, linux-kernel

Hi Quentin,

On Tue, 10 Dec 2024 at 19:46, Quentin Perret <qperret@google.com> wrote:
>
> On Tuesday 10 Dec 2024 at 15:14:03 (+0000), Fuad Tabba wrote:
> > > +int __pkvm_host_mkyoung_guest(u64 gfn, struct pkvm_hyp_vcpu *vcpu)
> > > +{
> > > +       struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
> > > +       u64 ipa = hyp_pfn_to_phys(gfn);
> > > +       u64 phys;
> > > +       int ret;
> > > +
> > > +       host_lock_component();
> > > +       guest_lock_component(vm);
> > > +
> > > +       ret = __check_host_unshare_guest(vm, &phys, ipa);
> >
> > While I'm bikeshedding some more, does the name
> > __check_host_unshare_guest() make sense? Should it be something like
> > __check_host_changeperm_guest(), or something along those lines? (feel
> > free to ignore this :) )
>
> I understand the comment, but not a huge fan of 'changeperm' as that
> sounds like we're only allowing permission changes while we use this
> all over the place. Maybe __check_host_is_shared_guest()? Naming is
> hard, so happy to take suggestions :-)

I've gone and done it now :) I almost like that, it's the *is* part I
don't like since it implied a boolean return. Maybe just
__check_host_shared_guest(), no is?

Cheers,
/fuad


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 10/18] KVM: arm64: Introduce __pkvm_host_share_guest()
  2024-12-11 10:07           ` Fuad Tabba
@ 2024-12-11 10:14             ` Quentin Perret
  2024-12-11 10:21               ` Quentin Perret
  0 siblings, 1 reply; 50+ messages in thread
From: Quentin Perret @ 2024-12-11 10:14 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Vincent Donnefort,
	Sebastian Ene, linux-arm-kernel, kvmarm, linux-kernel

On Wednesday 11 Dec 2024 at 10:07:16 (+0000), Fuad Tabba wrote:
> On Wed, 11 Dec 2024 at 09:58, Quentin Perret <qperret@google.com> wrote:
> >
> > On Tuesday 10 Dec 2024 at 15:51:01 (+0000), Fuad Tabba wrote:
> > > On Tue, 10 Dec 2024 at 15:41, Quentin Perret <qperret@google.com> wrote:
> > > > > Initially I thought the comment was related to the warning below,
> > > > > which confused me.
> > > >
> > > > It actually is about the warning below :-)
> > > >
> > > > > Now I think what you're trying to say is that we'll
> > > > > allow the share, and the (unrelated to the comment) warning is to
> > > > > ensure that the PKVM_PAGE_SHARED_OWNED is consistent with the share
> > > > > count.
> > > >
> > > > So, the only case where the host should ever attempt do use
> > > > __pkvm_host_share_guest() on a page that is already shared is for a page
> > > > already shared *with an np-guest*. The page->host_share_guest_count being
> > > > elevated is the easiest way to check that the page is indeed in that
> > > > state, hence the warning.
> > > >
> > > > If for example the host was trying to share with an np-guest a page that
> > > > is currently shared with the hypervisor, that check would fail. We can
> > > > discuss whether or not we would want to allow it, but for now there is
> > > > strictly no need for it so I went with the restrictive option. We can
> > > > relax that constraint later if need be.
> > > >
> > > > > I think what you should have here, which would work better with the
> > > > > comment, is something like:
> > > > >
> > > > >                 /* Only host to np-guest multi-sharing is tolerated */
> > > > > +               if (pkvm_hyp_vcpu_is_protected(vcpu))
> > > > > +                       return -EPERM;
> > > > >
> > > > > That would even make the comment unnecessary.
> > > >
> > > > I would prefer not adding this here, handle___pkvm_host_share_guest() in
> > > > hyp-main.c already does that for us.
> > >
> > > I understand now, and I agree that an additional check isn't
> > > necessary. Could you clarify the comment though? It's the word "only"
> > > that threw me off, since to me it implied that the check was enforcing
> > > the word "only". Maybe:
> > >
> > > >                 /* Tolerate host to np-guest multi-sharing. */
> >
> > I guess 'only' is somewhat important, it is the _only_ type of
> > multi-sharing that we allow and the check enforces precisely that. The
> > WARN_ON() will be triggered for any other type of multi-sharing, so we
> > are really checking that _only_ np-guest multi-sharing goes through.
> >
> > Perhaps the confusing part is that the code as-is relies on WARN_ON()
> > being fatal for the enforcement. Would it help if I changed the 'break'
> > statement right after to 'fallthrough' so we proceed to return -EPERM?
> > In practice we won't return anything as the hypervisor will panic, but
> > I presume it is better from a logic perspective.
> 
> It would, but then we wouldn't be tolerating np-guest multisharing,
> but like you said, it's not like we're tolerating it now anyway.
> 
> I wonder if it would be better simply not to allow multisharing at all for now.

That would mean turning off MMU notifiers in the host and taking
long-term GUP pins on np-guest pages I think. Multi-sharing can be
caused by many things, KSM, the zero page ... so we we'd need to turn
all of that off (IOW, no MMU notifiers).

That's more or less the status quo in Android, but I vote for not going
down that path upstream. pKVM should ideally be transparent for np-guest
support if at all possible.

Thanks,
Quentin


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 15/18] KVM: arm64: Introduce __pkvm_host_mkyoung_guest()
  2024-12-11 10:11       ` Fuad Tabba
@ 2024-12-11 10:18         ` Quentin Perret
  0 siblings, 0 replies; 50+ messages in thread
From: Quentin Perret @ 2024-12-11 10:18 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Vincent Donnefort,
	Sebastian Ene, linux-arm-kernel, kvmarm, linux-kernel

On Wednesday 11 Dec 2024 at 10:11:17 (+0000), Fuad Tabba wrote:
> Hi Quentin,
> 
> On Tue, 10 Dec 2024 at 19:46, Quentin Perret <qperret@google.com> wrote:
> >
> > On Tuesday 10 Dec 2024 at 15:14:03 (+0000), Fuad Tabba wrote:
> > > > +int __pkvm_host_mkyoung_guest(u64 gfn, struct pkvm_hyp_vcpu *vcpu)
> > > > +{
> > > > +       struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
> > > > +       u64 ipa = hyp_pfn_to_phys(gfn);
> > > > +       u64 phys;
> > > > +       int ret;
> > > > +
> > > > +       host_lock_component();
> > > > +       guest_lock_component(vm);
> > > > +
> > > > +       ret = __check_host_unshare_guest(vm, &phys, ipa);
> > >
> > > While I'm bikeshedding some more, does the name
> > > __check_host_unshare_guest() make sense? Should it be something like
> > > __check_host_changeperm_guest(), or something along those lines? (feel
> > > free to ignore this :) )
> >
> > I understand the comment, but not a huge fan of 'changeperm' as that
> > sounds like we're only allowing permission changes while we use this
> > all over the place. Maybe __check_host_is_shared_guest()? Naming is
> > hard, so happy to take suggestions :-)
> 
> I've gone and done it now :) I almost like that, it's the *is* part I
> don't like since it implied a boolean return. Maybe just
> __check_host_shared_guest(), no is?

Deal!

Cheers,
Quentin


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 16/18] KVM: arm64: Introduce __pkvm_tlb_flush_vmid()
  2024-12-11 10:03     ` Quentin Perret
@ 2024-12-11 10:21       ` Fuad Tabba
  0 siblings, 0 replies; 50+ messages in thread
From: Fuad Tabba @ 2024-12-11 10:21 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Vincent Donnefort,
	Sebastian Ene, linux-arm-kernel, kvmarm, linux-kernel

Hi Quentin,

On Wed, 11 Dec 2024 at 10:03, Quentin Perret <qperret@google.com> wrote:
>
> On Tuesday 10 Dec 2024 at 15:23:02 (+0000), Fuad Tabba wrote:
> > Hi Quentin,
> >
> > On Tue, 3 Dec 2024 at 10:38, Quentin Perret <qperret@google.com> wrote:
> > >
> > > Introduce a new hypercall to flush the TLBs of non-protected guests. The
> > > host kernel will be responsible for issuing this hypercall after changing
> > > stage-2 permissions using the __pkvm_host_relax_guest_perms() or
> > > __pkvm_host_wrprotect_guest() paths. This is left under the host's
> > > responsibility for performance reasons.
> > >
> > > Note however that the TLB maintenance for all *unmap* operations still
> > > remains entirely under the hypervisor's responsibility for security
> > > reasons -- an unmapped page may be donated to another entity, so a stale
> > > TLB entry could be used to leak private data.
> > >
> > > Signed-off-by: Quentin Perret <qperret@google.com>
> > > ---
> > >  arch/arm64/include/asm/kvm_asm.h   |  1 +
> > >  arch/arm64/kvm/hyp/nvhe/hyp-main.c | 17 +++++++++++++++++
> > >  2 files changed, 18 insertions(+)
> > >
> > > diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> > > index 6178e12a0dbc..df6237d0459c 100644
> > > --- a/arch/arm64/include/asm/kvm_asm.h
> > > +++ b/arch/arm64/include/asm/kvm_asm.h
> > > @@ -87,6 +87,7 @@ enum __kvm_host_smccc_func {
> > >         __KVM_HOST_SMCCC_FUNC___pkvm_teardown_vm,
> > >         __KVM_HOST_SMCCC_FUNC___pkvm_vcpu_load,
> > >         __KVM_HOST_SMCCC_FUNC___pkvm_vcpu_put,
> > > +       __KVM_HOST_SMCCC_FUNC___pkvm_tlb_flush_vmid,
> > >  };
> > >
> > >  #define DECLARE_KVM_VHE_SYM(sym)       extern char sym[]
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > > index de0012a75827..219d7fb850ec 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > > @@ -398,6 +398,22 @@ static void handle___kvm_tlb_flush_vmid(struct kvm_cpu_context *host_ctxt)
> > >         __kvm_tlb_flush_vmid(kern_hyp_va(mmu));
> > >  }
> > >
> > > +static void handle___pkvm_tlb_flush_vmid(struct kvm_cpu_context *host_ctxt)
> > > +{
> > > +       DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
> > > +       struct pkvm_hyp_vm *hyp_vm;
> > > +
> > > +       if (!is_protected_kvm_enabled())
> > > +               return;
> > > +
> > > +       hyp_vm = get_pkvm_hyp_vm(handle);
> > > +       if (!hyp_vm)
> > > +               return;
> > > +
> > > +       __kvm_tlb_flush_vmid(&hyp_vm->kvm.arch.mmu);
> > > +       put_pkvm_hyp_vm(hyp_vm);
> > > +}
> >
> > Since this is practically the same as kvm_tlb_flush_vmid(), does it
> > make sense to modify that instead (handle___kvm_tlb_flush_vmid()) to
> > do the right thing depending on whether pkvm is enabled? Thinking as
> > well for the future in case we want to support the rest of the
> > kvm_tlb_flush_vmid_*().
>
> I considered it, but the two implementations want different arguments --
> pkvm wants the handle while standard KVM uses the kvm struct address
> directly. I had an implementation at some point that multiplexed the
> implementations on a single HVC (we'd interpret the arguments
> differently depending on pKVM being enabled or not) but that felt more
> error prone than simply having two HVCs.
>
> Happy to reconsider if we can find a good way to make it work though.

I don't have a strong opinion about this. I think that for now, since
it's only this function, it's probably fine. That said, the
multiplexing is (as of patch 18, which I haven't reviewed yet) is just
lifted higher up to the host kernel, albeit with fewer parameters to
wiggle around.

To summarize, I think we can worry about it if/once we need the other
tlb_flush_* variants.

Cheers,
/fuad


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 10/18] KVM: arm64: Introduce __pkvm_host_share_guest()
  2024-12-11 10:14             ` Quentin Perret
@ 2024-12-11 10:21               ` Quentin Perret
  2024-12-11 10:32                 ` Fuad Tabba
  0 siblings, 1 reply; 50+ messages in thread
From: Quentin Perret @ 2024-12-11 10:21 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Vincent Donnefort,
	Sebastian Ene, linux-arm-kernel, kvmarm, linux-kernel

On Wednesday 11 Dec 2024 at 10:14:51 (+0000), Quentin Perret wrote:
> On Wednesday 11 Dec 2024 at 10:07:16 (+0000), Fuad Tabba wrote:
> > On Wed, 11 Dec 2024 at 09:58, Quentin Perret <qperret@google.com> wrote:
> > >
> > > On Tuesday 10 Dec 2024 at 15:51:01 (+0000), Fuad Tabba wrote:
> > > > On Tue, 10 Dec 2024 at 15:41, Quentin Perret <qperret@google.com> wrote:
> > > > > > Initially I thought the comment was related to the warning below,
> > > > > > which confused me.
> > > > >
> > > > > It actually is about the warning below :-)
> > > > >
> > > > > > Now I think what you're trying to say is that we'll
> > > > > > allow the share, and the (unrelated to the comment) warning is to
> > > > > > ensure that the PKVM_PAGE_SHARED_OWNED is consistent with the share
> > > > > > count.
> > > > >
> > > > > So, the only case where the host should ever attempt do use
> > > > > __pkvm_host_share_guest() on a page that is already shared is for a page
> > > > > already shared *with an np-guest*. The page->host_share_guest_count being
> > > > > elevated is the easiest way to check that the page is indeed in that
> > > > > state, hence the warning.
> > > > >
> > > > > If for example the host was trying to share with an np-guest a page that
> > > > > is currently shared with the hypervisor, that check would fail. We can
> > > > > discuss whether or not we would want to allow it, but for now there is
> > > > > strictly no need for it so I went with the restrictive option. We can
> > > > > relax that constraint later if need be.
> > > > >
> > > > > > I think what you should have here, which would work better with the
> > > > > > comment, is something like:
> > > > > >
> > > > > >                 /* Only host to np-guest multi-sharing is tolerated */
> > > > > > +               if (pkvm_hyp_vcpu_is_protected(vcpu))
> > > > > > +                       return -EPERM;
> > > > > >
> > > > > > That would even make the comment unnecessary.
> > > > >
> > > > > I would prefer not adding this here, handle___pkvm_host_share_guest() in
> > > > > hyp-main.c already does that for us.
> > > >
> > > > I understand now, and I agree that an additional check isn't
> > > > necessary. Could you clarify the comment though? It's the word "only"
> > > > that threw me off, since to me it implied that the check was enforcing
> > > > the word "only". Maybe:
> > > >
> > > > >                 /* Tolerate host to np-guest multi-sharing. */
> > >
> > > I guess 'only' is somewhat important, it is the _only_ type of
> > > multi-sharing that we allow and the check enforces precisely that. The
> > > WARN_ON() will be triggered for any other type of multi-sharing, so we
> > > are really checking that _only_ np-guest multi-sharing goes through.
> > >
> > > Perhaps the confusing part is that the code as-is relies on WARN_ON()
> > > being fatal for the enforcement. Would it help if I changed the 'break'
> > > statement right after to 'fallthrough' so we proceed to return -EPERM?
> > > In practice we won't return anything as the hypervisor will panic, but
> > > I presume it is better from a logic perspective.
> > 
> > It would, but then we wouldn't be tolerating np-guest multisharing,
> > but like you said, it's not like we're tolerating it now anyway.
> > 
> > I wonder if it would be better simply not to allow multisharing at all for now.
> 
> That would mean turning off MMU notifiers in the host and taking
> long-term GUP pins on np-guest pages I think. Multi-sharing can be
> caused by many things, KSM, the zero page ... so we we'd need to turn
> all of that off (IOW, no MMU notifiers).
> 
> That's more or less the status quo in Android, but I vote for not going
> down that path upstream. pKVM should ideally be transparent for np-guest
> support if at all possible.

And to clarify my suggestion above, we should fallthrough IFF
host_share_guest_count is 0, but break otherwise to retain multi-sharing
support. So it's not a simple s/break/fallthrough change, that needs a
tiny bit of added logic.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 10/18] KVM: arm64: Introduce __pkvm_host_share_guest()
  2024-12-11 10:21               ` Quentin Perret
@ 2024-12-11 10:32                 ` Fuad Tabba
  0 siblings, 0 replies; 50+ messages in thread
From: Fuad Tabba @ 2024-12-11 10:32 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Vincent Donnefort,
	Sebastian Ene, linux-arm-kernel, kvmarm, linux-kernel

On Wed, 11 Dec 2024 at 10:21, Quentin Perret <qperret@google.com> wrote:
>
> On Wednesday 11 Dec 2024 at 10:14:51 (+0000), Quentin Perret wrote:
> > On Wednesday 11 Dec 2024 at 10:07:16 (+0000), Fuad Tabba wrote:
> > > On Wed, 11 Dec 2024 at 09:58, Quentin Perret <qperret@google.com> wrote:
> > > >
> > > > On Tuesday 10 Dec 2024 at 15:51:01 (+0000), Fuad Tabba wrote:
> > > > > On Tue, 10 Dec 2024 at 15:41, Quentin Perret <qperret@google.com> wrote:
> > > > > > > Initially I thought the comment was related to the warning below,
> > > > > > > which confused me.
> > > > > >
> > > > > > It actually is about the warning below :-)
> > > > > >
> > > > > > > Now I think what you're trying to say is that we'll
> > > > > > > allow the share, and the (unrelated to the comment) warning is to
> > > > > > > ensure that the PKVM_PAGE_SHARED_OWNED is consistent with the share
> > > > > > > count.
> > > > > >
> > > > > > So, the only case where the host should ever attempt do use
> > > > > > __pkvm_host_share_guest() on a page that is already shared is for a page
> > > > > > already shared *with an np-guest*. The page->host_share_guest_count being
> > > > > > elevated is the easiest way to check that the page is indeed in that
> > > > > > state, hence the warning.
> > > > > >
> > > > > > If for example the host was trying to share with an np-guest a page that
> > > > > > is currently shared with the hypervisor, that check would fail. We can
> > > > > > discuss whether or not we would want to allow it, but for now there is
> > > > > > strictly no need for it so I went with the restrictive option. We can
> > > > > > relax that constraint later if need be.
> > > > > >
> > > > > > > I think what you should have here, which would work better with the
> > > > > > > comment, is something like:
> > > > > > >
> > > > > > >                 /* Only host to np-guest multi-sharing is tolerated */
> > > > > > > +               if (pkvm_hyp_vcpu_is_protected(vcpu))
> > > > > > > +                       return -EPERM;
> > > > > > >
> > > > > > > That would even make the comment unnecessary.
> > > > > >
> > > > > > I would prefer not adding this here, handle___pkvm_host_share_guest() in
> > > > > > hyp-main.c already does that for us.
> > > > >
> > > > > I understand now, and I agree that an additional check isn't
> > > > > necessary. Could you clarify the comment though? It's the word "only"
> > > > > that threw me off, since to me it implied that the check was enforcing
> > > > > the word "only". Maybe:
> > > > >
> > > > > >                 /* Tolerate host to np-guest multi-sharing. */
> > > >
> > > > I guess 'only' is somewhat important, it is the _only_ type of
> > > > multi-sharing that we allow and the check enforces precisely that. The
> > > > WARN_ON() will be triggered for any other type of multi-sharing, so we
> > > > are really checking that _only_ np-guest multi-sharing goes through.
> > > >
> > > > Perhaps the confusing part is that the code as-is relies on WARN_ON()
> > > > being fatal for the enforcement. Would it help if I changed the 'break'
> > > > statement right after to 'fallthrough' so we proceed to return -EPERM?
> > > > In practice we won't return anything as the hypervisor will panic, but
> > > > I presume it is better from a logic perspective.
> > >
> > > It would, but then we wouldn't be tolerating np-guest multisharing,
> > > but like you said, it's not like we're tolerating it now anyway.
> > >
> > > I wonder if it would be better simply not to allow multisharing at all for now.
> >
> > That would mean turning off MMU notifiers in the host and taking
> > long-term GUP pins on np-guest pages I think. Multi-sharing can be
> > caused by many things, KSM, the zero page ... so we we'd need to turn
> > all of that off (IOW, no MMU notifiers).
> >
> > That's more or less the status quo in Android, but I vote for not going
> > down that path upstream. pKVM should ideally be transparent for np-guest
> > support if at all possible.
>
> And to clarify my suggestion above, we should fallthrough IFF
> host_share_guest_count is 0, but break otherwise to retain multi-sharing
> support. So it's not a simple s/break/fallthrough change, that needs a
> tiny bit of added logic.

I think this would make things clearer. Thanks.

/fuad


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 17/18] KVM: arm64: Introduce the EL1 pKVM MMU
  2024-12-03 10:37 ` [PATCH v2 17/18] KVM: arm64: Introduce the EL1 pKVM MMU Quentin Perret
@ 2024-12-12 11:35   ` Marc Zyngier
  2024-12-12 12:03     ` Quentin Perret
  0 siblings, 1 reply; 50+ messages in thread
From: Marc Zyngier @ 2024-12-12 11:35 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Catalin Marinas, Will Deacon, Fuad Tabba, Vincent Donnefort,
	Sebastian Ene, linux-arm-kernel, kvmarm, linux-kernel

On Tue, 03 Dec 2024 10:37:34 +0000,
Quentin Perret <qperret@google.com> wrote:
> 
> Introduce a set of helper functions allowing to manipulate the pKVM
> guest stage-2 page-tables from EL1 using pKVM's HVC interface.
> 
> Each helper has an exact one-to-one correspondance with the traditional
> kvm_pgtable_stage2_*() functions from pgtable.c, with a strictly
> matching prototype. This will ease plumbing later on in mmu.c.
> 
> These callbacks track the gfn->pfn mappings in a simple rb_tree indexed
> by IPA in lieu of a page-table. This rb-tree is kept in sync with pKVM's
> state and is protected by a new rwlock -- the existing mmu_lock
> protection does not suffice in the map() path where the tree must be
> modified while user_mem_abort() only acquires a read_lock.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
> 
> The embedded union inside struct kvm_pgtable is arguably a bit horrible
> currently... I considered making the pgt argument to all kvm_pgtable_*()
> functions an opaque void * ptr, and moving the definition of
> struct kvm_pgtable to pgtable.c and the pkvm version into pkvm.c. Given
> that the allocation of that data-structure is done by the caller, that
> means we'd need to expose kvm_pgtable_get_pgd_size() or something that
> each MMU (pgtable.c and pkvm.c) would have to implement and things like
> that. But that felt like a bigger surgery, so I went with the simpler
> option. Thoughts welcome :-)

I really don't think it is too bad, and I rather keep some typing
rather than going the void * route. Some comments below.

> 
> Similarly, happy to drop the mappings_lock if we want to teach
> user_mem_abort() about taking a write lock on the mmu_lock in the pKVM
> case, but again this implementation is the least invasive into normal
> KVM so that felt like a reasonable starting point.
> ---
>  arch/arm64/include/asm/kvm_host.h    |   1 +
>  arch/arm64/include/asm/kvm_pgtable.h |  27 ++--
>  arch/arm64/include/asm/kvm_pkvm.h    |  28 ++++
>  arch/arm64/kvm/pkvm.c                | 195 +++++++++++++++++++++++++++
>  4 files changed, 242 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index f75988e3515b..05936b57a3a4 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -85,6 +85,7 @@ void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu);
>  struct kvm_hyp_memcache {
>  	phys_addr_t head;
>  	unsigned long nr_pages;
> +	struct pkvm_mapping *mapping; /* only used from EL1 */
>  };
>  
>  static inline void push_hyp_memcache(struct kvm_hyp_memcache *mc,
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 04418b5e3004..d24d18874015 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -412,15 +412,24 @@ static inline bool kvm_pgtable_walk_lock_held(void)
>   *			be used instead of block mappings.
>   */
>  struct kvm_pgtable {
> -	u32					ia_bits;
> -	s8					start_level;
> -	kvm_pteref_t				pgd;
> -	struct kvm_pgtable_mm_ops		*mm_ops;
> -
> -	/* Stage-2 only */
> -	struct kvm_s2_mmu			*mmu;
> -	enum kvm_pgtable_stage2_flags		flags;
> -	kvm_pgtable_force_pte_cb_t		force_pte_cb;
> +	union {
> +		struct {
> +			u32					ia_bits;
> +			s8					start_level;
> +			kvm_pteref_t				pgd;
> +			struct kvm_pgtable_mm_ops		*mm_ops;
> +
> +			/* Stage-2 only */
> +			struct kvm_s2_mmu			*mmu;
> +			enum kvm_pgtable_stage2_flags		flags;
> +			kvm_pgtable_force_pte_cb_t		force_pte_cb;
> +		};
> +		struct {
> +			struct kvm				*kvm;

Given that the kvm_s2_mmu already has a back-pointer to kvm_arch,
maybe you could keep that one common and use it?

There is also some baked assumption that non-NV is always using the
s2_mmu that's embedded in kvm_arch.

> +			struct rb_root				mappings;
> +			rwlock_t				mappings_lock;
> +		} pkvm;
> +	};
>  };
>  
>  /**
> diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
> index cd56acd9a842..84211d5daf87 100644
> --- a/arch/arm64/include/asm/kvm_pkvm.h
> +++ b/arch/arm64/include/asm/kvm_pkvm.h
> @@ -11,6 +11,12 @@
>  #include <linux/scatterlist.h>
>  #include <asm/kvm_pgtable.h>
>  
> +struct pkvm_mapping {
> +	u64 gfn;
> +	u64 pfn;
> +	struct rb_node node;

nit: make the node the first field.

> +};
> +
>  /* Maximum number of VMs that can co-exist under pKVM. */
>  #define KVM_MAX_PVMS 255
>  
> @@ -137,4 +143,26 @@ static inline size_t pkvm_host_sve_state_size(void)
>  			SVE_SIG_REGS_SIZE(sve_vq_from_vl(kvm_host_sve_max_vl)));
>  }
>  
> +static inline pkvm_handle_t pkvm_pgt_to_handle(struct kvm_pgtable *pgt)
> +{
> +	return pgt->pkvm.kvm->arch.pkvm.handle;
> +}
> +
> +int pkvm_pgtable_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu, struct kvm_pgtable_mm_ops *mm_ops);
> +void pkvm_pgtable_destroy(struct kvm_pgtable *pgt);
> +int pkvm_pgtable_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
> +			   u64 phys, enum kvm_pgtable_prot prot,
> +			   void *mc, enum kvm_pgtable_walk_flags flags);
> +int pkvm_pgtable_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
> +int pkvm_pgtable_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size);
> +int pkvm_pgtable_flush(struct kvm_pgtable *pgt, u64 addr, u64 size);
> +bool pkvm_pgtable_test_clear_young(struct kvm_pgtable *pgt, u64 addr, u64 size, bool mkold);
> +int pkvm_pgtable_relax_perms(struct kvm_pgtable *pgt, u64 addr, enum kvm_pgtable_prot prot,
> +			     enum kvm_pgtable_walk_flags flags);
> +void pkvm_pgtable_mkyoung(struct kvm_pgtable *pgt, u64 addr, enum kvm_pgtable_walk_flags flags);
> +int pkvm_pgtable_split(struct kvm_pgtable *pgt, u64 addr, u64 size, struct kvm_mmu_memory_cache *mc);
> +void pkvm_pgtable_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, s8 level);
> +kvm_pte_t *pkvm_pgtable_create_unlinked(struct kvm_pgtable *pgt, u64 phys, s8 level,
> +					enum kvm_pgtable_prot prot, void *mc, bool force_pte);
> +
>  #endif	/* __ARM64_KVM_PKVM_H__ */
> diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
> index 85117ea8f351..9c648a510671 100644
> --- a/arch/arm64/kvm/pkvm.c
> +++ b/arch/arm64/kvm/pkvm.c
> @@ -7,6 +7,7 @@
>  #include <linux/init.h>
>  #include <linux/kmemleak.h>
>  #include <linux/kvm_host.h>
> +#include <asm/kvm_mmu.h>
>  #include <linux/memblock.h>
>  #include <linux/mutex.h>
>  #include <linux/sort.h>
> @@ -268,3 +269,197 @@ static int __init finalize_pkvm(void)
>  	return ret;
>  }
>  device_initcall_sync(finalize_pkvm);
> +
> +static int cmp_mappings(struct rb_node *node, const struct rb_node *parent)
> +{
> +	struct pkvm_mapping *a = rb_entry(node, struct pkvm_mapping, node);
> +	struct pkvm_mapping *b = rb_entry(parent, struct pkvm_mapping, node);
> +
> +	if (a->gfn < b->gfn)
> +		return -1;
> +	if (a->gfn > b->gfn)
> +		return 1;
> +	return 0;
> +}
> +
> +static struct rb_node *find_first_mapping_node(struct rb_root *root, u64 gfn)
> +{
> +	struct rb_node *node = root->rb_node, *prev = NULL;
> +	struct pkvm_mapping *mapping;
> +
> +	while (node) {
> +		mapping = rb_entry(node, struct pkvm_mapping, node);
> +		if (mapping->gfn == gfn)
> +			return node;
> +		prev = node;
> +		node = (gfn < mapping->gfn) ? node->rb_left : node->rb_right;
> +	}
> +
> +	return prev;
> +}
> +
> +#define for_each_mapping_in_range(pgt, start_ipa, end_ipa, mapping, tmp)				\
> +	for (tmp = find_first_mapping_node(&pgt->pkvm.mappings, ((start_ipa) >> PAGE_SHIFT));		\
> +	     tmp && ({ mapping = rb_entry(tmp, struct pkvm_mapping, node); tmp = rb_next(tmp); 1; });)	\
> +		if (mapping->gfn < ((start_ipa) >> PAGE_SHIFT))						\
> +			continue;									\
> +		else if (mapping->gfn >= ((end_ipa) >> PAGE_SHIFT))					\
> +			break;										\
> +		else

Oh gawd... This makes my head spin, and it can't be said that I'm
adverse to the most bizarre macro constructs. I came up with this:

diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 9c648a5106717..b1b8501cae8f7 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -298,13 +298,19 @@ static struct rb_node *find_first_mapping_node(struct rb_root *root, u64 gfn)
 	return prev;
 }
 
-#define for_each_mapping_in_range(pgt, start_ipa, end_ipa, mapping, tmp)				\
-	for (tmp = find_first_mapping_node(&pgt->pkvm.mappings, ((start_ipa) >> PAGE_SHIFT));		\
-	     tmp && ({ mapping = rb_entry(tmp, struct pkvm_mapping, node); tmp = rb_next(tmp); 1; });)	\
-		if (mapping->gfn < ((start_ipa) >> PAGE_SHIFT))						\
-			continue;									\
-		else if (mapping->gfn >= ((end_ipa) >> PAGE_SHIFT))					\
-			break;										\
+#define for_each_mapping_in_range(__pgt, __start, __end, __map)				 \
+	for (struct rb_node *__tmp = find_first_mapping_node(&__pgt->pkvm.mappings, 	 \
+							     ((__start) >> PAGE_SHIFT)); \
+	     __tmp && ({								 \
+			     __map = rb_entry(__tmp, struct pkvm_mapping, node); 	 \
+			     __tmp = rb_next(__tmp);					 \
+			     true;							 \
+		     });								 \
+	     )										 \
+		if (__map->gfn < ((__start) >> PAGE_SHIFT))				 \
+			continue;							 \
+		else if (__map->gfn >= ((__end) >> PAGE_SHIFT))				 \
+			break;								 \
 		else
 
 int pkvm_pgtable_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu, struct kvm_pgtable_mm_ops *mm_ops)
@@ -371,11 +377,10 @@ int pkvm_pgtable_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
 {
 	pkvm_handle_t handle = pkvm_pgt_to_handle(pgt);
 	struct pkvm_mapping *mapping;
-	struct rb_node *tmp;
 	int ret = 0;
 
 	write_lock(&pgt->pkvm.mappings_lock);
-	for_each_mapping_in_range(pgt, addr, addr + size, mapping, tmp) {
+	for_each_mapping_in_range(pgt, addr, addr + size, mapping) {
 		ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_guest, handle, mapping->gfn);
 		if (WARN_ON(ret))
 			break;
@@ -392,11 +397,10 @@ int pkvm_pgtable_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
 {
 	pkvm_handle_t handle = pkvm_pgt_to_handle(pgt);
 	struct pkvm_mapping *mapping;
-	struct rb_node *tmp;
 	int ret = 0;
 
 	read_lock(&pgt->pkvm.mappings_lock);
-	for_each_mapping_in_range(pgt, addr, addr + size, mapping, tmp) {
+	for_each_mapping_in_range(pgt, addr, addr + size, mapping) {
 		ret = kvm_call_hyp_nvhe(__pkvm_host_wrprotect_guest, handle, mapping->gfn);
 		if (WARN_ON(ret))
 			break;
@@ -409,10 +413,9 @@ int pkvm_pgtable_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
 int pkvm_pgtable_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
 {
 	struct pkvm_mapping *mapping;
-	struct rb_node *tmp;
 
 	read_lock(&pgt->pkvm.mappings_lock);
-	for_each_mapping_in_range(pgt, addr, addr + size, mapping, tmp)
+	for_each_mapping_in_range(pgt, addr, addr + size, mapping)
 		__clean_dcache_guest_page(pfn_to_kaddr(mapping->pfn), PAGE_SIZE);
 	read_unlock(&pgt->pkvm.mappings_lock);
 
@@ -423,11 +426,10 @@ bool pkvm_pgtable_test_clear_young(struct kvm_pgtable *pgt, u64 addr, u64 size,
 {
 	pkvm_handle_t handle = pkvm_pgt_to_handle(pgt);
 	struct pkvm_mapping *mapping;
-	struct rb_node *tmp;
 	bool young = false;
 
 	read_lock(&pgt->pkvm.mappings_lock);
-	for_each_mapping_in_range(pgt, addr, addr + size, mapping, tmp)
+	for_each_mapping_in_range(pgt, addr, addr + size, mapping)
 		young |= kvm_call_hyp_nvhe(__pkvm_host_test_clear_young_guest, handle, mapping->gfn,
 					   mkold);
 	read_unlock(&pgt->pkvm.mappings_lock);

Should be semantically equivalent, but I find it way more readable.

Also, maybe add a comment indicating why __tmp needs to be updated
*before* the body of the loop gets executed (case of freeing the
mapping from within the body).

> +
> +int pkvm_pgtable_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu, struct kvm_pgtable_mm_ops *mm_ops)
> +{
> +	pgt->pkvm.kvm		= kvm_s2_mmu_to_kvm(mmu);
> +	pgt->pkvm.mappings	= RB_ROOT;
> +	rwlock_init(&pgt->pkvm.mappings_lock);

We talked about this f2f: Given that this lock is semantically
equivalent to the MMU lock, maybe just use that by upgrading it to be
taken as for write when pKVM is enabled.

It should be easy enough to wrap that in helpers that DTRT, and all
this code could become devoid of any extra locking.

[...]

> +int pkvm_pgtable_relax_perms(struct kvm_pgtable *pgt, u64 addr, enum kvm_pgtable_prot prot,
> +			     enum kvm_pgtable_walk_flags flags)
> +{
> +	return kvm_call_hyp_nvhe(__pkvm_host_relax_guest_perms, addr >> PAGE_SHIFT, prot);
> +}
> +
> +void pkvm_pgtable_mkyoung(struct kvm_pgtable *pgt, u64 addr, enum kvm_pgtable_walk_flags flags)
> +{
> +	WARN_ON(kvm_call_hyp_nvhe(__pkvm_host_mkyoung_guest, addr >> PAGE_SHIFT));
> +}
> +
> +void pkvm_pgtable_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, s8 level)
> +{
> +	WARN_ON(1);
> +}
> +
> +kvm_pte_t *pkvm_pgtable_create_unlinked(struct kvm_pgtable *pgt, u64 phys, s8 level,
> +					enum kvm_pgtable_prot prot, void *mc, bool force_pte)
> +{
> +	WARN_ON(1);
> +	return NULL;
> +}
> +
> +int pkvm_pgtable_split(struct kvm_pgtable *pgt, u64 addr, u64 size, struct kvm_mmu_memory_cache *mc)
> +{
> +	WARN_ON(1);
> +	return -EINVAL;
> +}

Maybe turn these warnings into their _ONCE version. If we end-up here,
seeing it once should be enough to realise we're toast.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 17/18] KVM: arm64: Introduce the EL1 pKVM MMU
  2024-12-12 11:35   ` Marc Zyngier
@ 2024-12-12 12:03     ` Quentin Perret
  0 siblings, 0 replies; 50+ messages in thread
From: Quentin Perret @ 2024-12-12 12:03 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Catalin Marinas, Will Deacon, Fuad Tabba, Vincent Donnefort,
	Sebastian Ene, linux-arm-kernel, kvmarm, linux-kernel

On Thursday 12 Dec 2024 at 11:35:09 (+0000), Marc Zyngier wrote:
> On Tue, 03 Dec 2024 10:37:34 +0000,
> Quentin Perret <qperret@google.com> wrote:
> > 
> > Introduce a set of helper functions allowing to manipulate the pKVM
> > guest stage-2 page-tables from EL1 using pKVM's HVC interface.
> > 
> > Each helper has an exact one-to-one correspondance with the traditional
> > kvm_pgtable_stage2_*() functions from pgtable.c, with a strictly
> > matching prototype. This will ease plumbing later on in mmu.c.
> > 
> > These callbacks track the gfn->pfn mappings in a simple rb_tree indexed
> > by IPA in lieu of a page-table. This rb-tree is kept in sync with pKVM's
> > state and is protected by a new rwlock -- the existing mmu_lock
> > protection does not suffice in the map() path where the tree must be
> > modified while user_mem_abort() only acquires a read_lock.
> > 
> > Signed-off-by: Quentin Perret <qperret@google.com>
> > ---
> > 
> > The embedded union inside struct kvm_pgtable is arguably a bit horrible
> > currently... I considered making the pgt argument to all kvm_pgtable_*()
> > functions an opaque void * ptr, and moving the definition of
> > struct kvm_pgtable to pgtable.c and the pkvm version into pkvm.c. Given
> > that the allocation of that data-structure is done by the caller, that
> > means we'd need to expose kvm_pgtable_get_pgd_size() or something that
> > each MMU (pgtable.c and pkvm.c) would have to implement and things like
> > that. But that felt like a bigger surgery, so I went with the simpler
> > option. Thoughts welcome :-)
> 
> I really don't think it is too bad, and I rather keep some typing
> rather than going the void * route. Some comments below.

Sounds good.

> > 
> > Similarly, happy to drop the mappings_lock if we want to teach
> > user_mem_abort() about taking a write lock on the mmu_lock in the pKVM
> > case, but again this implementation is the least invasive into normal
> > KVM so that felt like a reasonable starting point.
> > ---
> >  arch/arm64/include/asm/kvm_host.h    |   1 +
> >  arch/arm64/include/asm/kvm_pgtable.h |  27 ++--
> >  arch/arm64/include/asm/kvm_pkvm.h    |  28 ++++
> >  arch/arm64/kvm/pkvm.c                | 195 +++++++++++++++++++++++++++
> >  4 files changed, 242 insertions(+), 9 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index f75988e3515b..05936b57a3a4 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -85,6 +85,7 @@ void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu);
> >  struct kvm_hyp_memcache {
> >  	phys_addr_t head;
> >  	unsigned long nr_pages;
> > +	struct pkvm_mapping *mapping; /* only used from EL1 */
> >  };
> >  
> >  static inline void push_hyp_memcache(struct kvm_hyp_memcache *mc,
> > diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> > index 04418b5e3004..d24d18874015 100644
> > --- a/arch/arm64/include/asm/kvm_pgtable.h
> > +++ b/arch/arm64/include/asm/kvm_pgtable.h
> > @@ -412,15 +412,24 @@ static inline bool kvm_pgtable_walk_lock_held(void)
> >   *			be used instead of block mappings.
> >   */
> >  struct kvm_pgtable {
> > -	u32					ia_bits;
> > -	s8					start_level;
> > -	kvm_pteref_t				pgd;
> > -	struct kvm_pgtable_mm_ops		*mm_ops;
> > -
> > -	/* Stage-2 only */
> > -	struct kvm_s2_mmu			*mmu;
> > -	enum kvm_pgtable_stage2_flags		flags;
> > -	kvm_pgtable_force_pte_cb_t		force_pte_cb;
> > +	union {
> > +		struct {
> > +			u32					ia_bits;
> > +			s8					start_level;
> > +			kvm_pteref_t				pgd;
> > +			struct kvm_pgtable_mm_ops		*mm_ops;
> > +
> > +			/* Stage-2 only */
> > +			struct kvm_s2_mmu			*mmu;
> > +			enum kvm_pgtable_stage2_flags		flags;
> > +			kvm_pgtable_force_pte_cb_t		force_pte_cb;
> > +		};
> > +		struct {
> > +			struct kvm				*kvm;
> 
> Given that the kvm_s2_mmu already has a back-pointer to kvm_arch,
> maybe you could keep that one common and use it?
> 
> There is also some baked assumption that non-NV is always using the
> s2_mmu that's embedded in kvm_arch.

Right, what I need is one kvm_s2_mmu_to_kvm() away, so that should work
nicely. And as discussed below, I'll try ditching mappings_lock, so we
should be left the rb_root on its own.

> > +			struct rb_root				mappings;
> > +			rwlock_t				mappings_lock;
> > +		} pkvm;
> > +	};
> >  };
> >  
> >  /**
> > diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
> > index cd56acd9a842..84211d5daf87 100644
> > --- a/arch/arm64/include/asm/kvm_pkvm.h
> > +++ b/arch/arm64/include/asm/kvm_pkvm.h
> > @@ -11,6 +11,12 @@
> >  #include <linux/scatterlist.h>
> >  #include <asm/kvm_pgtable.h>
> >  
> > +struct pkvm_mapping {
> > +	u64 gfn;
> > +	u64 pfn;
> > +	struct rb_node node;
> 
> nit: make the node the first field.

Ack.

> > +};
> > +
> >  /* Maximum number of VMs that can co-exist under pKVM. */
> >  #define KVM_MAX_PVMS 255
> >  
> > @@ -137,4 +143,26 @@ static inline size_t pkvm_host_sve_state_size(void)
> >  			SVE_SIG_REGS_SIZE(sve_vq_from_vl(kvm_host_sve_max_vl)));
> >  }
> >  
> > +static inline pkvm_handle_t pkvm_pgt_to_handle(struct kvm_pgtable *pgt)
> > +{
> > +	return pgt->pkvm.kvm->arch.pkvm.handle;
> > +}
> > +
> > +int pkvm_pgtable_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu, struct kvm_pgtable_mm_ops *mm_ops);
> > +void pkvm_pgtable_destroy(struct kvm_pgtable *pgt);
> > +int pkvm_pgtable_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
> > +			   u64 phys, enum kvm_pgtable_prot prot,
> > +			   void *mc, enum kvm_pgtable_walk_flags flags);
> > +int pkvm_pgtable_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
> > +int pkvm_pgtable_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size);
> > +int pkvm_pgtable_flush(struct kvm_pgtable *pgt, u64 addr, u64 size);
> > +bool pkvm_pgtable_test_clear_young(struct kvm_pgtable *pgt, u64 addr, u64 size, bool mkold);
> > +int pkvm_pgtable_relax_perms(struct kvm_pgtable *pgt, u64 addr, enum kvm_pgtable_prot prot,
> > +			     enum kvm_pgtable_walk_flags flags);
> > +void pkvm_pgtable_mkyoung(struct kvm_pgtable *pgt, u64 addr, enum kvm_pgtable_walk_flags flags);
> > +int pkvm_pgtable_split(struct kvm_pgtable *pgt, u64 addr, u64 size, struct kvm_mmu_memory_cache *mc);
> > +void pkvm_pgtable_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, s8 level);
> > +kvm_pte_t *pkvm_pgtable_create_unlinked(struct kvm_pgtable *pgt, u64 phys, s8 level,
> > +					enum kvm_pgtable_prot prot, void *mc, bool force_pte);
> > +
> >  #endif	/* __ARM64_KVM_PKVM_H__ */
> > diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
> > index 85117ea8f351..9c648a510671 100644
> > --- a/arch/arm64/kvm/pkvm.c
> > +++ b/arch/arm64/kvm/pkvm.c
> > @@ -7,6 +7,7 @@
> >  #include <linux/init.h>
> >  #include <linux/kmemleak.h>
> >  #include <linux/kvm_host.h>
> > +#include <asm/kvm_mmu.h>
> >  #include <linux/memblock.h>
> >  #include <linux/mutex.h>
> >  #include <linux/sort.h>
> > @@ -268,3 +269,197 @@ static int __init finalize_pkvm(void)
> >  	return ret;
> >  }
> >  device_initcall_sync(finalize_pkvm);
> > +
> > +static int cmp_mappings(struct rb_node *node, const struct rb_node *parent)
> > +{
> > +	struct pkvm_mapping *a = rb_entry(node, struct pkvm_mapping, node);
> > +	struct pkvm_mapping *b = rb_entry(parent, struct pkvm_mapping, node);
> > +
> > +	if (a->gfn < b->gfn)
> > +		return -1;
> > +	if (a->gfn > b->gfn)
> > +		return 1;
> > +	return 0;
> > +}
> > +
> > +static struct rb_node *find_first_mapping_node(struct rb_root *root, u64 gfn)
> > +{
> > +	struct rb_node *node = root->rb_node, *prev = NULL;
> > +	struct pkvm_mapping *mapping;
> > +
> > +	while (node) {
> > +		mapping = rb_entry(node, struct pkvm_mapping, node);
> > +		if (mapping->gfn == gfn)
> > +			return node;
> > +		prev = node;
> > +		node = (gfn < mapping->gfn) ? node->rb_left : node->rb_right;
> > +	}
> > +
> > +	return prev;
> > +}
> > +
> > +#define for_each_mapping_in_range(pgt, start_ipa, end_ipa, mapping, tmp)				\
> > +	for (tmp = find_first_mapping_node(&pgt->pkvm.mappings, ((start_ipa) >> PAGE_SHIFT));		\
> > +	     tmp && ({ mapping = rb_entry(tmp, struct pkvm_mapping, node); tmp = rb_next(tmp); 1; });)	\
> > +		if (mapping->gfn < ((start_ipa) >> PAGE_SHIFT))						\
> > +			continue;									\
> > +		else if (mapping->gfn >= ((end_ipa) >> PAGE_SHIFT))					\
> > +			break;										\
> > +		else
> 
> Oh gawd... This makes my head spin, and it can't be said that I'm
> adverse to the most bizarre macro constructs. I came up with this:
> 
> diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
> index 9c648a5106717..b1b8501cae8f7 100644
> --- a/arch/arm64/kvm/pkvm.c
> +++ b/arch/arm64/kvm/pkvm.c
> @@ -298,13 +298,19 @@ static struct rb_node *find_first_mapping_node(struct rb_root *root, u64 gfn)
>  	return prev;
>  }
>  
> -#define for_each_mapping_in_range(pgt, start_ipa, end_ipa, mapping, tmp)				\
> -	for (tmp = find_first_mapping_node(&pgt->pkvm.mappings, ((start_ipa) >> PAGE_SHIFT));		\
> -	     tmp && ({ mapping = rb_entry(tmp, struct pkvm_mapping, node); tmp = rb_next(tmp); 1; });)	\
> -		if (mapping->gfn < ((start_ipa) >> PAGE_SHIFT))						\
> -			continue;									\
> -		else if (mapping->gfn >= ((end_ipa) >> PAGE_SHIFT))					\
> -			break;										\
> +#define for_each_mapping_in_range(__pgt, __start, __end, __map)				 \
> +	for (struct rb_node *__tmp = find_first_mapping_node(&__pgt->pkvm.mappings, 	 \
> +							     ((__start) >> PAGE_SHIFT)); \
> +	     __tmp && ({								 \
> +			     __map = rb_entry(__tmp, struct pkvm_mapping, node); 	 \
> +			     __tmp = rb_next(__tmp);					 \
> +			     true;							 \
> +		     });								 \
> +	     )										 \
> +		if (__map->gfn < ((__start) >> PAGE_SHIFT))				 \
> +			continue;							 \
> +		else if (__map->gfn >= ((__end) >> PAGE_SHIFT))				 \
> +			break;								 \
>  		else
>  
>  int pkvm_pgtable_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu, struct kvm_pgtable_mm_ops *mm_ops)
> @@ -371,11 +377,10 @@ int pkvm_pgtable_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
>  {
>  	pkvm_handle_t handle = pkvm_pgt_to_handle(pgt);
>  	struct pkvm_mapping *mapping;
> -	struct rb_node *tmp;
>  	int ret = 0;
>  
>  	write_lock(&pgt->pkvm.mappings_lock);
> -	for_each_mapping_in_range(pgt, addr, addr + size, mapping, tmp) {
> +	for_each_mapping_in_range(pgt, addr, addr + size, mapping) {
>  		ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_guest, handle, mapping->gfn);
>  		if (WARN_ON(ret))
>  			break;
> @@ -392,11 +397,10 @@ int pkvm_pgtable_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
>  {
>  	pkvm_handle_t handle = pkvm_pgt_to_handle(pgt);
>  	struct pkvm_mapping *mapping;
> -	struct rb_node *tmp;
>  	int ret = 0;
>  
>  	read_lock(&pgt->pkvm.mappings_lock);
> -	for_each_mapping_in_range(pgt, addr, addr + size, mapping, tmp) {
> +	for_each_mapping_in_range(pgt, addr, addr + size, mapping) {
>  		ret = kvm_call_hyp_nvhe(__pkvm_host_wrprotect_guest, handle, mapping->gfn);
>  		if (WARN_ON(ret))
>  			break;
> @@ -409,10 +413,9 @@ int pkvm_pgtable_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
>  int pkvm_pgtable_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
>  {
>  	struct pkvm_mapping *mapping;
> -	struct rb_node *tmp;
>  
>  	read_lock(&pgt->pkvm.mappings_lock);
> -	for_each_mapping_in_range(pgt, addr, addr + size, mapping, tmp)
> +	for_each_mapping_in_range(pgt, addr, addr + size, mapping)
>  		__clean_dcache_guest_page(pfn_to_kaddr(mapping->pfn), PAGE_SIZE);
>  	read_unlock(&pgt->pkvm.mappings_lock);
>  
> @@ -423,11 +426,10 @@ bool pkvm_pgtable_test_clear_young(struct kvm_pgtable *pgt, u64 addr, u64 size,
>  {
>  	pkvm_handle_t handle = pkvm_pgt_to_handle(pgt);
>  	struct pkvm_mapping *mapping;
> -	struct rb_node *tmp;
>  	bool young = false;
>  
>  	read_lock(&pgt->pkvm.mappings_lock);
> -	for_each_mapping_in_range(pgt, addr, addr + size, mapping, tmp)
> +	for_each_mapping_in_range(pgt, addr, addr + size, mapping)
>  		young |= kvm_call_hyp_nvhe(__pkvm_host_test_clear_young_guest, handle, mapping->gfn,
>  					   mkold);
>  	read_unlock(&pgt->pkvm.mappings_lock);
> 
> Should be semantically equivalent, but I find it way more readable.

Yep, declaring __tmp within the loop itself is much nicer, it certainly
doesn't need to outlive it. I'll fold that in, thanks!

> Also, maybe add a comment indicating why __tmp needs to be updated
> *before* the body of the loop gets executed (case of freeing the
> mapping from within the body).

Ack, and I might rename the macro to for_each_mapping_in_range_safe()
as well to make it clear we have that property.

> > +
> > +int pkvm_pgtable_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu, struct kvm_pgtable_mm_ops *mm_ops)
> > +{
> > +	pgt->pkvm.kvm		= kvm_s2_mmu_to_kvm(mmu);
> > +	pgt->pkvm.mappings	= RB_ROOT;
> > +	rwlock_init(&pgt->pkvm.mappings_lock);
> 
> We talked about this f2f: Given that this lock is semantically
> equivalent to the MMU lock, maybe just use that by upgrading it to be
> taken as for write when pKVM is enabled.
> 
> It should be easy enough to wrap that in helpers that DTRT, and all
> this code could become devoid of any extra locking.

OK, I'll also stick lockdep assertions to document and enforce the
locking requirements from here.

> > +int pkvm_pgtable_relax_perms(struct kvm_pgtable *pgt, u64 addr, enum kvm_pgtable_prot prot,
> > +			     enum kvm_pgtable_walk_flags flags)
> > +{
> > +	return kvm_call_hyp_nvhe(__pkvm_host_relax_guest_perms, addr >> PAGE_SHIFT, prot);
> > +}
> > +
> > +void pkvm_pgtable_mkyoung(struct kvm_pgtable *pgt, u64 addr, enum kvm_pgtable_walk_flags flags)
> > +{
> > +	WARN_ON(kvm_call_hyp_nvhe(__pkvm_host_mkyoung_guest, addr >> PAGE_SHIFT));
> > +}
> > +
> > +void pkvm_pgtable_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, s8 level)
> > +{
> > +	WARN_ON(1);
> > +}
> > +
> > +kvm_pte_t *pkvm_pgtable_create_unlinked(struct kvm_pgtable *pgt, u64 phys, s8 level,
> > +					enum kvm_pgtable_prot prot, void *mc, bool force_pte)
> > +{
> > +	WARN_ON(1);
> > +	return NULL;
> > +}
> > +
> > +int pkvm_pgtable_split(struct kvm_pgtable *pgt, u64 addr, u64 size, struct kvm_mmu_memory_cache *mc)
> > +{
> > +	WARN_ON(1);
> > +	return -EINVAL;
> > +}
> 
> Maybe turn these warnings into their _ONCE version. If we end-up here,
> seeing it once should be enough to realise we're toast.

Will do.

Cheers,
Quentin


^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2024-12-12 12:05 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-03 10:37 [PATCH v2 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
2024-12-03 10:37 ` [PATCH v2 01/18] KVM: arm64: Change the layout of enum pkvm_page_state Quentin Perret
2024-12-10 12:59   ` Fuad Tabba
2024-12-10 15:15     ` Quentin Perret
2024-12-03 10:37 ` [PATCH v2 02/18] KVM: arm64: Move enum pkvm_page_state to memory.h Quentin Perret
2024-12-03 10:37 ` [PATCH v2 03/18] KVM: arm64: Make hyp_page::order a u8 Quentin Perret
2024-12-03 10:37 ` [PATCH v2 04/18] KVM: arm64: Move host page ownership tracking to the hyp vmemmap Quentin Perret
2024-12-10 13:02   ` Fuad Tabba
2024-12-10 15:29     ` Quentin Perret
2024-12-10 15:46       ` Fuad Tabba
2024-12-03 10:37 ` [PATCH v2 05/18] KVM: arm64: Pass walk flags to kvm_pgtable_stage2_mkyoung Quentin Perret
2024-12-03 10:37 ` [PATCH v2 06/18] KVM: arm64: Pass walk flags to kvm_pgtable_stage2_relax_perms Quentin Perret
2024-12-03 10:37 ` [PATCH v2 07/18] KVM: arm64: Make kvm_pgtable_stage2_init() a static inline function Quentin Perret
2024-12-03 10:37 ` [PATCH v2 08/18] KVM: arm64: Add {get,put}_pkvm_hyp_vm() helpers Quentin Perret
2024-12-03 10:37 ` [PATCH v2 09/18] KVM: arm64: Introduce __pkvm_vcpu_{load,put}() Quentin Perret
2024-12-03 10:37 ` [PATCH v2 10/18] KVM: arm64: Introduce __pkvm_host_share_guest() Quentin Perret
2024-12-10 13:58   ` Fuad Tabba
2024-12-10 15:41     ` Quentin Perret
2024-12-10 15:51       ` Fuad Tabba
2024-12-11  9:58         ` Quentin Perret
2024-12-11 10:07           ` Fuad Tabba
2024-12-11 10:14             ` Quentin Perret
2024-12-11 10:21               ` Quentin Perret
2024-12-11 10:32                 ` Fuad Tabba
2024-12-03 10:37 ` [PATCH v2 11/18] KVM: arm64: Introduce __pkvm_host_unshare_guest() Quentin Perret
2024-12-10 14:41   ` Fuad Tabba
2024-12-10 15:53     ` Quentin Perret
2024-12-10 15:57       ` Fuad Tabba
2024-12-03 10:37 ` [PATCH v2 12/18] KVM: arm64: Introduce __pkvm_host_relax_guest_perms() Quentin Perret
2024-12-10 14:56   ` Fuad Tabba
2024-12-11  8:57     ` Quentin Perret
2024-12-03 10:37 ` [PATCH v2 13/18] KVM: arm64: Introduce __pkvm_host_wrprotect_guest() Quentin Perret
2024-12-10 15:06   ` Fuad Tabba
2024-12-10 19:38     ` Quentin Perret
2024-12-03 10:37 ` [PATCH v2 14/18] KVM: arm64: Introduce __pkvm_host_test_clear_young_guest() Quentin Perret
2024-12-10 15:11   ` Fuad Tabba
2024-12-10 19:39     ` Quentin Perret
2024-12-03 10:37 ` [PATCH v2 15/18] KVM: arm64: Introduce __pkvm_host_mkyoung_guest() Quentin Perret
2024-12-10 15:14   ` Fuad Tabba
2024-12-10 19:46     ` Quentin Perret
2024-12-11 10:11       ` Fuad Tabba
2024-12-11 10:18         ` Quentin Perret
2024-12-03 10:37 ` [PATCH v2 16/18] KVM: arm64: Introduce __pkvm_tlb_flush_vmid() Quentin Perret
2024-12-10 15:23   ` Fuad Tabba
2024-12-11 10:03     ` Quentin Perret
2024-12-11 10:21       ` Fuad Tabba
2024-12-03 10:37 ` [PATCH v2 17/18] KVM: arm64: Introduce the EL1 pKVM MMU Quentin Perret
2024-12-12 11:35   ` Marc Zyngier
2024-12-12 12:03     ` Quentin Perret
2024-12-03 10:37 ` [PATCH v2 18/18] KVM: arm64: Plumb the pKVM MMU in KVM Quentin Perret

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).