* [PATCH 01/18] KVM: arm64: Change the layout of enum pkvm_page_state
2024-11-04 13:31 [PATCH 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
@ 2024-11-04 13:31 ` Quentin Perret
2024-11-04 17:31 ` Sebastian Ene
2024-11-04 13:31 ` [PATCH 02/18] KVM: arm64: Move enum pkvm_page_state to memory.h Quentin Perret
` (16 subsequent siblings)
17 siblings, 1 reply; 24+ messages in thread
From: Quentin Perret @ 2024-11-04 13:31 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Catalin Marinas, Will Deacon
Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
kvmarm, linux-kernel
The 'concrete' (a.k.a non-meta) page states are currently encoded using
software bits in PTEs. For performance reasons, the abstract
pkvm_page_state enum uses the same bits to encode these states as that
makes conversions from and to PTEs easy.
In order to prepare the ground for moving the 'concrete' state storage
to the hyp vmemmap, re-arrange the enum to use bits 0 and 1 for this
purpose.
No functional changes intended.
Signed-off-by: Quentin Perret <qperret@google.com>
---
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 0972faccc2af..ca3177481b78 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -24,25 +24,28 @@
*/
enum pkvm_page_state {
PKVM_PAGE_OWNED = 0ULL,
- PKVM_PAGE_SHARED_OWNED = KVM_PGTABLE_PROT_SW0,
- PKVM_PAGE_SHARED_BORROWED = KVM_PGTABLE_PROT_SW1,
- __PKVM_PAGE_RESERVED = KVM_PGTABLE_PROT_SW0 |
- KVM_PGTABLE_PROT_SW1,
+ PKVM_PAGE_SHARED_OWNED = BIT(0),
+ PKVM_PAGE_SHARED_BORROWED = BIT(1),
+ __PKVM_PAGE_RESERVED = BIT(0) | BIT(1),
/* Meta-states which aren't encoded directly in the PTE's SW bits */
- PKVM_NOPAGE,
+ PKVM_NOPAGE = BIT(2),
};
+#define PKVM_PAGE_META_STATES_MASK (~(BIT(0) | BIT(1)))
#define PKVM_PAGE_STATE_PROT_MASK (KVM_PGTABLE_PROT_SW0 | KVM_PGTABLE_PROT_SW1)
static inline enum kvm_pgtable_prot pkvm_mkstate(enum kvm_pgtable_prot prot,
enum pkvm_page_state state)
{
- return (prot & ~PKVM_PAGE_STATE_PROT_MASK) | state;
+ BUG_ON(state & PKVM_PAGE_META_STATES_MASK);
+ prot &= ~PKVM_PAGE_STATE_PROT_MASK;
+ prot |= FIELD_PREP(PKVM_PAGE_STATE_PROT_MASK, state);
+ return prot;
}
static inline enum pkvm_page_state pkvm_getstate(enum kvm_pgtable_prot prot)
{
- return prot & PKVM_PAGE_STATE_PROT_MASK;
+ return FIELD_GET(PKVM_PAGE_STATE_PROT_MASK, prot);
}
struct host_mmu {
--
2.47.0.163.g1226f6d8fa-goog
^ permalink raw reply related [flat|nested] 24+ messages in thread* Re: [PATCH 01/18] KVM: arm64: Change the layout of enum pkvm_page_state
2024-11-04 13:31 ` [PATCH 01/18] KVM: arm64: Change the layout of enum pkvm_page_state Quentin Perret
@ 2024-11-04 17:31 ` Sebastian Ene
2024-11-04 17:46 ` Quentin Perret
0 siblings, 1 reply; 24+ messages in thread
From: Sebastian Ene @ 2024-11-04 17:31 UTC (permalink / raw)
To: Quentin Perret
Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Catalin Marinas, Will Deacon, Fuad Tabba,
Vincent Donnefort, linux-arm-kernel, kvmarm, linux-kernel
On Mon, Nov 04, 2024 at 01:31:47PM +0000, Quentin Perret wrote:
Hi Quentin,
> The 'concrete' (a.k.a non-meta) page states are currently encoded using
> software bits in PTEs. For performance reasons, the abstract
> pkvm_page_state enum uses the same bits to encode these states as that
> makes conversions from and to PTEs easy.
>
> In order to prepare the ground for moving the 'concrete' state storage
> to the hyp vmemmap, re-arrange the enum to use bits 0 and 1 for this
> purpose.
>
> No functional changes intended.
>
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
> arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 17 ++++++++++-------
> 1 file changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> index 0972faccc2af..ca3177481b78 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -24,25 +24,28 @@
> */
> enum pkvm_page_state {
> PKVM_PAGE_OWNED = 0ULL,
> - PKVM_PAGE_SHARED_OWNED = KVM_PGTABLE_PROT_SW0,
> - PKVM_PAGE_SHARED_BORROWED = KVM_PGTABLE_PROT_SW1,
> - __PKVM_PAGE_RESERVED = KVM_PGTABLE_PROT_SW0 |
> - KVM_PGTABLE_PROT_SW1,
> + PKVM_PAGE_SHARED_OWNED = BIT(0),
> + PKVM_PAGE_SHARED_BORROWED = BIT(1),
> + __PKVM_PAGE_RESERVED = BIT(0) | BIT(1),
>
> /* Meta-states which aren't encoded directly in the PTE's SW bits */
> - PKVM_NOPAGE,
> + PKVM_NOPAGE = BIT(2),
> };
I guess we will still have to keep around the software bit annotation
for sharing MMIO regions from the host. This would not be tracked by the
vmemmap but it will still be in the s2 pagetable. As this is tagged with no
functional changes intended, are we safe because we are not supporting
MMIO sharing currently ?
> +#define PKVM_PAGE_META_STATES_MASK (~(BIT(0) | BIT(1)))
>
> #define PKVM_PAGE_STATE_PROT_MASK (KVM_PGTABLE_PROT_SW0 | KVM_PGTABLE_PROT_SW1)
> static inline enum kvm_pgtable_prot pkvm_mkstate(enum kvm_pgtable_prot prot,
> enum pkvm_page_state state)
> {
> - return (prot & ~PKVM_PAGE_STATE_PROT_MASK) | state;
> + BUG_ON(state & PKVM_PAGE_META_STATES_MASK);
> + prot &= ~PKVM_PAGE_STATE_PROT_MASK;
> + prot |= FIELD_PREP(PKVM_PAGE_STATE_PROT_MASK, state);
> + return prot;
> }
>
> static inline enum pkvm_page_state pkvm_getstate(enum kvm_pgtable_prot prot)
> {
> - return prot & PKVM_PAGE_STATE_PROT_MASK;
> + return FIELD_GET(PKVM_PAGE_STATE_PROT_MASK, prot);
> }
>
Thanks,
Sebastian
> struct host_mmu {
> --
> 2.47.0.163.g1226f6d8fa-goog
>
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH 01/18] KVM: arm64: Change the layout of enum pkvm_page_state
2024-11-04 17:31 ` Sebastian Ene
@ 2024-11-04 17:46 ` Quentin Perret
0 siblings, 0 replies; 24+ messages in thread
From: Quentin Perret @ 2024-11-04 17:46 UTC (permalink / raw)
To: Sebastian Ene
Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Catalin Marinas, Will Deacon, Fuad Tabba,
Vincent Donnefort, linux-arm-kernel, kvmarm, linux-kernel
Hi Seb,
On Monday 04 Nov 2024 at 17:31:50 (+0000), Sebastian Ene wrote:
> On Mon, Nov 04, 2024 at 01:31:47PM +0000, Quentin Perret wrote:
>
> Hi Quentin,
>
> > The 'concrete' (a.k.a non-meta) page states are currently encoded using
> > software bits in PTEs. For performance reasons, the abstract
> > pkvm_page_state enum uses the same bits to encode these states as that
> > makes conversions from and to PTEs easy.
> >
> > In order to prepare the ground for moving the 'concrete' state storage
> > to the hyp vmemmap, re-arrange the enum to use bits 0 and 1 for this
> > purpose.
> >
> > No functional changes intended.
> >
> > Signed-off-by: Quentin Perret <qperret@google.com>
> > ---
> > arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 17 ++++++++++-------
> > 1 file changed, 10 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > index 0972faccc2af..ca3177481b78 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > @@ -24,25 +24,28 @@
> > */
> > enum pkvm_page_state {
> > PKVM_PAGE_OWNED = 0ULL,
> > - PKVM_PAGE_SHARED_OWNED = KVM_PGTABLE_PROT_SW0,
> > - PKVM_PAGE_SHARED_BORROWED = KVM_PGTABLE_PROT_SW1,
> > - __PKVM_PAGE_RESERVED = KVM_PGTABLE_PROT_SW0 |
> > - KVM_PGTABLE_PROT_SW1,
> > + PKVM_PAGE_SHARED_OWNED = BIT(0),
> > + PKVM_PAGE_SHARED_BORROWED = BIT(1),
> > + __PKVM_PAGE_RESERVED = BIT(0) | BIT(1),
> >
> > /* Meta-states which aren't encoded directly in the PTE's SW bits */
> > - PKVM_NOPAGE,
> > + PKVM_NOPAGE = BIT(2),
> > };
>
> I guess we will still have to keep around the software bit annotation
> for sharing MMIO regions from the host. This would not be tracked by the
> vmemmap but it will still be in the s2 pagetable. As this is tagged with no
> functional changes intended, are we safe because we are not supporting
> MMIO sharing currently ?
That's right, currently no MMIO sharing is allowed -- see how
host_get_page_state() returns PKVM_NOPAGE for non-memory addresses, so
we should be good!
Sharing/donating devices is absolutely going to be needed eventually,
but I hoped we could make that a separate series. And yes, we'll need
_some_ data-structure at EL2 to track that, possibly the page-table,
but could also be something else similar to how this series moves that
away from the pgt for memory. Hopefully that's an orthogonal discussion
we can have later :-)
Thanks,
Quentin
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 02/18] KVM: arm64: Move enum pkvm_page_state to memory.h
2024-11-04 13:31 [PATCH 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
2024-11-04 13:31 ` [PATCH 01/18] KVM: arm64: Change the layout of enum pkvm_page_state Quentin Perret
@ 2024-11-04 13:31 ` Quentin Perret
2024-11-04 13:31 ` [PATCH 03/18] KVM: arm64: Make hyp_page::order a u8 Quentin Perret
` (15 subsequent siblings)
17 siblings, 0 replies; 24+ messages in thread
From: Quentin Perret @ 2024-11-04 13:31 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Catalin Marinas, Will Deacon
Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
kvmarm, linux-kernel
In order to prepare the way for storing page-tracking information in
pKVM's vmemmap, move the enum pkvm_page_state definition to
nvhe/memory.h.
No functional changes intended.
Signed-off-by: Quentin Perret <qperret@google.com>
---
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 35 +------------------
arch/arm64/kvm/hyp/include/nvhe/memory.h | 34 ++++++++++++++++++
2 files changed, 35 insertions(+), 34 deletions(-)
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index ca3177481b78..25038ac705d8 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -11,43 +11,10 @@
#include <asm/kvm_mmu.h>
#include <asm/kvm_pgtable.h>
#include <asm/virt.h>
+#include <nvhe/memory.h>
#include <nvhe/pkvm.h>
#include <nvhe/spinlock.h>
-/*
- * SW bits 0-1 are reserved to track the memory ownership state of each page:
- * 00: The page is owned exclusively by the page-table owner.
- * 01: The page is owned by the page-table owner, but is shared
- * with another entity.
- * 10: The page is shared with, but not owned by the page-table owner.
- * 11: Reserved for future use (lending).
- */
-enum pkvm_page_state {
- PKVM_PAGE_OWNED = 0ULL,
- PKVM_PAGE_SHARED_OWNED = BIT(0),
- PKVM_PAGE_SHARED_BORROWED = BIT(1),
- __PKVM_PAGE_RESERVED = BIT(0) | BIT(1),
-
- /* Meta-states which aren't encoded directly in the PTE's SW bits */
- PKVM_NOPAGE = BIT(2),
-};
-#define PKVM_PAGE_META_STATES_MASK (~(BIT(0) | BIT(1)))
-
-#define PKVM_PAGE_STATE_PROT_MASK (KVM_PGTABLE_PROT_SW0 | KVM_PGTABLE_PROT_SW1)
-static inline enum kvm_pgtable_prot pkvm_mkstate(enum kvm_pgtable_prot prot,
- enum pkvm_page_state state)
-{
- BUG_ON(state & PKVM_PAGE_META_STATES_MASK);
- prot &= ~PKVM_PAGE_STATE_PROT_MASK;
- prot |= FIELD_PREP(PKVM_PAGE_STATE_PROT_MASK, state);
- return prot;
-}
-
-static inline enum pkvm_page_state pkvm_getstate(enum kvm_pgtable_prot prot)
-{
- return FIELD_GET(PKVM_PAGE_STATE_PROT_MASK, prot);
-}
-
struct host_mmu {
struct kvm_arch arch;
struct kvm_pgtable pgt;
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index ab205c4d6774..6dfeb000371c 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -7,6 +7,40 @@
#include <linux/types.h>
+/*
+ * SW bits 0-1 are reserved to track the memory ownership state of each page:
+ * 00: The page is owned exclusively by the page-table owner.
+ * 01: The page is owned by the page-table owner, but is shared
+ * with another entity.
+ * 10: The page is shared with, but not owned by the page-table owner.
+ * 11: Reserved for future use (lending).
+ */
+enum pkvm_page_state {
+ PKVM_PAGE_OWNED = 0ULL,
+ PKVM_PAGE_SHARED_OWNED = BIT(0),
+ PKVM_PAGE_SHARED_BORROWED = BIT(1),
+ __PKVM_PAGE_RESERVED = BIT(0) | BIT(1),
+
+ /* Meta-states which aren't encoded directly in the PTE's SW bits */
+ PKVM_NOPAGE = BIT(2),
+};
+#define PKVM_PAGE_META_STATES_MASK (~(BIT(0) | BIT(1)))
+
+#define PKVM_PAGE_STATE_PROT_MASK (KVM_PGTABLE_PROT_SW0 | KVM_PGTABLE_PROT_SW1)
+static inline enum kvm_pgtable_prot pkvm_mkstate(enum kvm_pgtable_prot prot,
+ enum pkvm_page_state state)
+{
+ BUG_ON(state & PKVM_PAGE_META_STATES_MASK);
+ prot &= ~PKVM_PAGE_STATE_PROT_MASK;
+ prot |= FIELD_PREP(PKVM_PAGE_STATE_PROT_MASK, state);
+ return prot;
+}
+
+static inline enum pkvm_page_state pkvm_getstate(enum kvm_pgtable_prot prot)
+{
+ return FIELD_GET(PKVM_PAGE_STATE_PROT_MASK, prot);
+}
+
struct hyp_page {
unsigned short refcount;
unsigned short order;
--
2.47.0.163.g1226f6d8fa-goog
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 03/18] KVM: arm64: Make hyp_page::order a u8
2024-11-04 13:31 [PATCH 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
2024-11-04 13:31 ` [PATCH 01/18] KVM: arm64: Change the layout of enum pkvm_page_state Quentin Perret
2024-11-04 13:31 ` [PATCH 02/18] KVM: arm64: Move enum pkvm_page_state to memory.h Quentin Perret
@ 2024-11-04 13:31 ` Quentin Perret
2024-11-04 13:31 ` [PATCH 04/18] KVM: arm64: Move host page ownership tracking to the hyp vmemmap Quentin Perret
` (14 subsequent siblings)
17 siblings, 0 replies; 24+ messages in thread
From: Quentin Perret @ 2024-11-04 13:31 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Catalin Marinas, Will Deacon
Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
kvmarm, linux-kernel
We don't need 16 bits to store the hyp page order, and we'll need some
bits to store page ownership data soon, so let's reduce the order
member.
Signed-off-by: Quentin Perret <qperret@google.com>
---
arch/arm64/kvm/hyp/include/nvhe/gfp.h | 6 +++---
arch/arm64/kvm/hyp/include/nvhe/memory.h | 5 +++--
arch/arm64/kvm/hyp/nvhe/page_alloc.c | 14 +++++++-------
3 files changed, 13 insertions(+), 12 deletions(-)
diff --git a/arch/arm64/kvm/hyp/include/nvhe/gfp.h b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
index 97c527ef53c2..f1725bad6331 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/gfp.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
@@ -7,7 +7,7 @@
#include <nvhe/memory.h>
#include <nvhe/spinlock.h>
-#define HYP_NO_ORDER USHRT_MAX
+#define HYP_NO_ORDER 0xff
struct hyp_pool {
/*
@@ -19,11 +19,11 @@ struct hyp_pool {
struct list_head free_area[NR_PAGE_ORDERS];
phys_addr_t range_start;
phys_addr_t range_end;
- unsigned short max_order;
+ u8 max_order;
};
/* Allocation */
-void *hyp_alloc_pages(struct hyp_pool *pool, unsigned short order);
+void *hyp_alloc_pages(struct hyp_pool *pool, u8 order);
void hyp_split_page(struct hyp_page *page);
void hyp_get_page(struct hyp_pool *pool, void *addr);
void hyp_put_page(struct hyp_pool *pool, void *addr);
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index 6dfeb000371c..88cb8ff9e769 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -42,8 +42,9 @@ static inline enum pkvm_page_state pkvm_getstate(enum kvm_pgtable_prot prot)
}
struct hyp_page {
- unsigned short refcount;
- unsigned short order;
+ u16 refcount;
+ u8 order;
+ u8 reserved;
};
extern u64 __hyp_vmemmap;
diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index e691290d3765..a1eb27a1a747 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -32,7 +32,7 @@ u64 __hyp_vmemmap;
*/
static struct hyp_page *__find_buddy_nocheck(struct hyp_pool *pool,
struct hyp_page *p,
- unsigned short order)
+ u8 order)
{
phys_addr_t addr = hyp_page_to_phys(p);
@@ -51,7 +51,7 @@ static struct hyp_page *__find_buddy_nocheck(struct hyp_pool *pool,
/* Find a buddy page currently available for allocation */
static struct hyp_page *__find_buddy_avail(struct hyp_pool *pool,
struct hyp_page *p,
- unsigned short order)
+ u8 order)
{
struct hyp_page *buddy = __find_buddy_nocheck(pool, p, order);
@@ -94,7 +94,7 @@ static void __hyp_attach_page(struct hyp_pool *pool,
struct hyp_page *p)
{
phys_addr_t phys = hyp_page_to_phys(p);
- unsigned short order = p->order;
+ u8 order = p->order;
struct hyp_page *buddy;
memset(hyp_page_to_virt(p), 0, PAGE_SIZE << p->order);
@@ -129,7 +129,7 @@ static void __hyp_attach_page(struct hyp_pool *pool,
static struct hyp_page *__hyp_extract_page(struct hyp_pool *pool,
struct hyp_page *p,
- unsigned short order)
+ u8 order)
{
struct hyp_page *buddy;
@@ -183,7 +183,7 @@ void hyp_get_page(struct hyp_pool *pool, void *addr)
void hyp_split_page(struct hyp_page *p)
{
- unsigned short order = p->order;
+ u8 order = p->order;
unsigned int i;
p->order = 0;
@@ -195,10 +195,10 @@ void hyp_split_page(struct hyp_page *p)
}
}
-void *hyp_alloc_pages(struct hyp_pool *pool, unsigned short order)
+void *hyp_alloc_pages(struct hyp_pool *pool, u8 order)
{
- unsigned short i = order;
struct hyp_page *p;
+ u8 i = order;
hyp_spin_lock(&pool->lock);
--
2.47.0.163.g1226f6d8fa-goog
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 04/18] KVM: arm64: Move host page ownership tracking to the hyp vmemmap
2024-11-04 13:31 [PATCH 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
` (2 preceding siblings ...)
2024-11-04 13:31 ` [PATCH 03/18] KVM: arm64: Make hyp_page::order a u8 Quentin Perret
@ 2024-11-04 13:31 ` Quentin Perret
2024-11-04 13:31 ` [PATCH 05/18] KVM: arm64: Pass walk flags to kvm_pgtable_stage2_mkyoung Quentin Perret
` (13 subsequent siblings)
17 siblings, 0 replies; 24+ messages in thread
From: Quentin Perret @ 2024-11-04 13:31 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Catalin Marinas, Will Deacon
Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
kvmarm, linux-kernel
We currently store part of the page-tracking state in PTE software bits
for the host, guests and the hypervisor. This is sub-optimal when e.g.
sharing pages as this forces to break block mappings purely to support
this software tracking. This causes an unnecessarily fragmented stage-2
page-table for the host in particular when it shares pages with Secure,
which can lead to measurable regressions. Moreover, having this state
stored in the page-table forces us to do multiple costly walks on the
page transition path, hence causing overhead.
In order to work around these problems, move the host-side page-tracking
logic from SW bits in its stage-2 PTEs to the hypervisor's vmemmap.
Signed-off-by: Quentin Perret <qperret@google.com>
---
arch/arm64/kvm/hyp/include/nvhe/memory.h | 6 +-
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 94 ++++++++++++++++--------
arch/arm64/kvm/hyp/nvhe/setup.c | 7 +-
3 files changed, 71 insertions(+), 36 deletions(-)
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index 88cb8ff9e769..08f3a0416d4c 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -8,7 +8,7 @@
#include <linux/types.h>
/*
- * SW bits 0-1 are reserved to track the memory ownership state of each page:
+ * Bits 0-1 are reserved to track the memory ownership state of each page:
* 00: The page is owned exclusively by the page-table owner.
* 01: The page is owned by the page-table owner, but is shared
* with another entity.
@@ -44,7 +44,9 @@ static inline enum pkvm_page_state pkvm_getstate(enum kvm_pgtable_prot prot)
struct hyp_page {
u16 refcount;
u8 order;
- u8 reserved;
+
+ /* Host (non-meta) state. Guarded by the host stage-2 lock. */
+ enum pkvm_page_state host_state : 8;
};
extern u64 __hyp_vmemmap;
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index caba3e4bd09e..1595081c4f6b 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -201,8 +201,8 @@ static void *guest_s2_zalloc_page(void *mc)
memset(addr, 0, PAGE_SIZE);
p = hyp_virt_to_page(addr);
- memset(p, 0, sizeof(*p));
p->refcount = 1;
+ p->order = 0;
return addr;
}
@@ -268,6 +268,7 @@ int kvm_guest_prepare_stage2(struct pkvm_hyp_vm *vm, void *pgd)
void reclaim_guest_pages(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc)
{
+ struct hyp_page *page;
void *addr;
/* Dump all pgtable pages in the hyp_pool */
@@ -279,7 +280,9 @@ void reclaim_guest_pages(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc)
/* Drain the hyp_pool into the memcache */
addr = hyp_alloc_pages(&vm->pool, 0);
while (addr) {
- memset(hyp_virt_to_page(addr), 0, sizeof(struct hyp_page));
+ page = hyp_virt_to_page(addr);
+ page->refcount = 0;
+ page->order = 0;
push_hyp_memcache(mc, addr, hyp_virt_to_phys);
WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(addr), 1));
addr = hyp_alloc_pages(&vm->pool, 0);
@@ -382,19 +385,25 @@ bool addr_is_memory(phys_addr_t phys)
return !!find_mem_range(phys, &range);
}
-static bool addr_is_allowed_memory(phys_addr_t phys)
+static bool is_in_mem_range(u64 addr, struct kvm_mem_range *range)
+{
+ return range->start <= addr && addr < range->end;
+}
+
+static int range_is_allowed_memory(u64 start, u64 end)
{
struct memblock_region *reg;
struct kvm_mem_range range;
- reg = find_mem_range(phys, &range);
+ /* Can't check the state of both MMIO and memory regions at once */
+ reg = find_mem_range(start, &range);
+ if (!is_in_mem_range(end - 1, &range))
+ return -EINVAL;
- return reg && !(reg->flags & MEMBLOCK_NOMAP);
-}
+ if (!reg || reg->flags & MEMBLOCK_NOMAP)
+ return -EPERM;
-static bool is_in_mem_range(u64 addr, struct kvm_mem_range *range)
-{
- return range->start <= addr && addr < range->end;
+ return 0;
}
static bool range_is_memory(u64 start, u64 end)
@@ -454,8 +463,11 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
if (kvm_pte_valid(pte))
return -EAGAIN;
- if (pte)
+ if (pte) {
+ WARN_ON(addr_is_memory(addr) &&
+ !(hyp_phys_to_page(addr)->host_state & PKVM_NOPAGE));
return -EPERM;
+ }
do {
u64 granule = kvm_granule_size(level);
@@ -477,10 +489,29 @@ int host_stage2_idmap_locked(phys_addr_t addr, u64 size,
return host_stage2_try(__host_stage2_idmap, addr, addr + size, prot);
}
+static void __host_update_page_state(phys_addr_t addr, u64 size, enum pkvm_page_state state)
+{
+ phys_addr_t end = addr + size;
+ for (; addr < end; addr += PAGE_SIZE)
+ hyp_phys_to_page(addr)->host_state = state;
+}
+
int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
{
- return host_stage2_try(kvm_pgtable_stage2_set_owner, &host_mmu.pgt,
- addr, size, &host_s2_pool, owner_id);
+ int ret;
+
+ ret = host_stage2_try(kvm_pgtable_stage2_set_owner, &host_mmu.pgt,
+ addr, size, &host_s2_pool, owner_id);
+ if (ret || !addr_is_memory(addr))
+ return ret;
+
+ /* Don't forget to update the vmemmap tracking for the host */
+ if (owner_id == PKVM_ID_HOST)
+ __host_update_page_state(addr, size, PKVM_PAGE_OWNED);
+ else
+ __host_update_page_state(addr, size, PKVM_NOPAGE);
+
+ return 0;
}
static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot prot)
@@ -604,35 +635,38 @@ static int check_page_state_range(struct kvm_pgtable *pgt, u64 addr, u64 size,
return kvm_pgtable_walk(pgt, addr, size, &walker);
}
-static enum pkvm_page_state host_get_page_state(kvm_pte_t pte, u64 addr)
-{
- if (!addr_is_allowed_memory(addr))
- return PKVM_NOPAGE;
-
- if (!kvm_pte_valid(pte) && pte)
- return PKVM_NOPAGE;
-
- return pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte));
-}
-
static int __host_check_page_state_range(u64 addr, u64 size,
enum pkvm_page_state state)
{
- struct check_walk_data d = {
- .desired = state,
- .get_page_state = host_get_page_state,
- };
+ u64 end = addr + size;
+ int ret;
+
+ ret = range_is_allowed_memory(addr, end);
+ if (ret)
+ return ret;
hyp_assert_lock_held(&host_mmu.lock);
- return check_page_state_range(&host_mmu.pgt, addr, size, &d);
+ for (; addr < end; addr += PAGE_SIZE) {
+ if (hyp_phys_to_page(addr)->host_state != state)
+ return -EPERM;
+ }
+
+ return 0;
}
static int __host_set_page_state_range(u64 addr, u64 size,
enum pkvm_page_state state)
{
- enum kvm_pgtable_prot prot = pkvm_mkstate(PKVM_HOST_MEM_PROT, state);
+ if (hyp_phys_to_page(addr)->host_state & PKVM_NOPAGE) {
+ int ret = host_stage2_idmap_locked(addr, size, PKVM_HOST_MEM_PROT);
- return host_stage2_idmap_locked(addr, size, prot);
+ if (ret)
+ return ret;
+ }
+
+ __host_update_page_state(addr, size, state);
+
+ return 0;
}
static int host_request_owned_transition(u64 *completer_addr,
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 174007f3fadd..c315710f57ad 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -198,7 +198,6 @@ static void hpool_put_page(void *addr)
static int fix_host_ownership_walker(const struct kvm_pgtable_visit_ctx *ctx,
enum kvm_pgtable_walk_flags visit)
{
- enum kvm_pgtable_prot prot;
enum pkvm_page_state state;
phys_addr_t phys;
@@ -221,16 +220,16 @@ static int fix_host_ownership_walker(const struct kvm_pgtable_visit_ctx *ctx,
case PKVM_PAGE_OWNED:
return host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_HYP);
case PKVM_PAGE_SHARED_OWNED:
- prot = pkvm_mkstate(PKVM_HOST_MEM_PROT, PKVM_PAGE_SHARED_BORROWED);
+ hyp_phys_to_page(phys)->host_state = PKVM_PAGE_SHARED_BORROWED;
break;
case PKVM_PAGE_SHARED_BORROWED:
- prot = pkvm_mkstate(PKVM_HOST_MEM_PROT, PKVM_PAGE_SHARED_OWNED);
+ hyp_phys_to_page(phys)->host_state = PKVM_PAGE_SHARED_OWNED;
break;
default:
return -EINVAL;
}
- return host_stage2_idmap_locked(phys, PAGE_SIZE, prot);
+ return 0;
}
static int fix_hyp_pgtable_refcnt_walker(const struct kvm_pgtable_visit_ctx *ctx,
--
2.47.0.163.g1226f6d8fa-goog
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 05/18] KVM: arm64: Pass walk flags to kvm_pgtable_stage2_mkyoung
2024-11-04 13:31 [PATCH 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
` (3 preceding siblings ...)
2024-11-04 13:31 ` [PATCH 04/18] KVM: arm64: Move host page ownership tracking to the hyp vmemmap Quentin Perret
@ 2024-11-04 13:31 ` Quentin Perret
2024-11-04 13:31 ` [PATCH 06/18] KVM: arm64: Pass walk flags to kvm_pgtable_stage2_relax_perms Quentin Perret
` (12 subsequent siblings)
17 siblings, 0 replies; 24+ messages in thread
From: Quentin Perret @ 2024-11-04 13:31 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Catalin Marinas, Will Deacon
Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
kvmarm, linux-kernel
kvm_pgtable_stage2_mkyoung currently assumes that it is being called
from a 'shared' walker, which will not be true once called from pKVM.
To allow for the re-use of that function, make the walk flags one of
its parameters.
Signed-off-by: Quentin Perret <qperret@google.com>
---
arch/arm64/include/asm/kvm_pgtable.h | 4 +++-
arch/arm64/kvm/hyp/pgtable.c | 7 +++----
arch/arm64/kvm/mmu.c | 3 ++-
3 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 03f4c3d7839c..442a45d38e23 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -669,6 +669,7 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size);
* kvm_pgtable_stage2_mkyoung() - Set the access flag in a page-table entry.
* @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
* @addr: Intermediate physical address to identify the page-table entry.
+ * @flags: Flags to control the page-table walk (ex. a shared walk)
*
* The offset of @addr within a page is ignored.
*
@@ -677,7 +678,8 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size);
*
* Return: The old page-table entry prior to setting the flag, 0 on failure.
*/
-kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr);
+kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr,
+ enum kvm_pgtable_walk_flags flags);
/**
* kvm_pgtable_stage2_test_clear_young() - Test and optionally clear the access
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index b11bcebac908..fa25062f0590 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1245,15 +1245,14 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
NULL, NULL, 0);
}
-kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
+kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr,
+ enum kvm_pgtable_walk_flags flags)
{
kvm_pte_t pte = 0;
int ret;
ret = stage2_update_leaf_attrs(pgt, addr, 1, KVM_PTE_LEAF_ATTR_LO_S2_AF, 0,
- &pte, NULL,
- KVM_PGTABLE_WALK_HANDLE_FAULT |
- KVM_PGTABLE_WALK_SHARED);
+ &pte, NULL, flags);
if (!ret)
dsb(ishst);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 0f7658aefa1a..27e1b281f402 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1708,6 +1708,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
/* Resolve the access fault by making the page young again. */
static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
{
+ enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED;
kvm_pte_t pte;
struct kvm_s2_mmu *mmu;
@@ -1715,7 +1716,7 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
read_lock(&vcpu->kvm->mmu_lock);
mmu = vcpu->arch.hw_mmu;
- pte = kvm_pgtable_stage2_mkyoung(mmu->pgt, fault_ipa);
+ pte = kvm_pgtable_stage2_mkyoung(mmu->pgt, fault_ipa, flags);
read_unlock(&vcpu->kvm->mmu_lock);
if (kvm_pte_valid(pte))
--
2.47.0.163.g1226f6d8fa-goog
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 06/18] KVM: arm64: Pass walk flags to kvm_pgtable_stage2_relax_perms
2024-11-04 13:31 [PATCH 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
` (4 preceding siblings ...)
2024-11-04 13:31 ` [PATCH 05/18] KVM: arm64: Pass walk flags to kvm_pgtable_stage2_mkyoung Quentin Perret
@ 2024-11-04 13:31 ` Quentin Perret
2024-11-04 13:31 ` [PATCH 07/18] KVM: arm64: Make kvm_pgtable_stage2_init() a static inline function Quentin Perret
` (11 subsequent siblings)
17 siblings, 0 replies; 24+ messages in thread
From: Quentin Perret @ 2024-11-04 13:31 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Catalin Marinas, Will Deacon
Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
kvmarm, linux-kernel
kvm_pgtable_stage2_relax_perms currently assumes that it is being called
from a 'shared' walker, which will not be true once called from pKVM. To
allow for the re-use of that function, make the walk flags one of its
parameters.
Signed-off-by: Quentin Perret <qperret@google.com>
---
arch/arm64/include/asm/kvm_pgtable.h | 4 +++-
arch/arm64/kvm/hyp/pgtable.c | 6 ++----
arch/arm64/kvm/mmu.c | 7 +++----
3 files changed, 8 insertions(+), 9 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 442a45d38e23..f52fa8158ce6 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -709,6 +709,7 @@ bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
* @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
* @addr: Intermediate physical address to identify the page-table entry.
* @prot: Additional permissions to grant for the mapping.
+ * @flags: Flags to control the page-table walk (ex. a shared walk)
*
* The offset of @addr within a page is ignored.
*
@@ -721,7 +722,8 @@ bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
* Return: 0 on success, negative error code on failure.
*/
int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
- enum kvm_pgtable_prot prot);
+ enum kvm_pgtable_prot prot,
+ enum kvm_pgtable_walk_flags flags);
/**
* kvm_pgtable_stage2_flush_range() - Clean and invalidate data cache to Point
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index fa25062f0590..ee060438dc77 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1310,7 +1310,7 @@ bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
}
int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
- enum kvm_pgtable_prot prot)
+ enum kvm_pgtable_prot prot, enum kvm_pgtable_walk_flags flags)
{
int ret;
s8 level;
@@ -1328,9 +1328,7 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
if (prot & KVM_PGTABLE_PROT_X)
clr |= KVM_PTE_LEAF_ATTR_HI_S2_XN;
- ret = stage2_update_leaf_attrs(pgt, addr, 1, set, clr, NULL, &level,
- KVM_PGTABLE_WALK_HANDLE_FAULT |
- KVM_PGTABLE_WALK_SHARED);
+ ret = stage2_update_leaf_attrs(pgt, addr, 1, set, clr, NULL, &level, flags);
if (!ret || ret == -EAGAIN)
kvm_call_hyp(__kvm_tlb_flush_vmid_ipa_nsh, pgt->mmu, addr, level);
return ret;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 27e1b281f402..80dd61038cc7 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1440,6 +1440,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
long vma_pagesize, fault_granule;
enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
struct kvm_pgtable *pgt;
+ enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED;
if (fault_is_perm)
fault_granule = kvm_vcpu_trap_get_perm_fault_granule(vcpu);
@@ -1683,13 +1684,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
* PTE, which will be preserved.
*/
prot &= ~KVM_NV_GUEST_MAP_SZ;
- ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
+ ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot, flags);
} else {
ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
__pfn_to_phys(pfn), prot,
- memcache,
- KVM_PGTABLE_WALK_HANDLE_FAULT |
- KVM_PGTABLE_WALK_SHARED);
+ memcache, flags);
}
out_unlock:
--
2.47.0.163.g1226f6d8fa-goog
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 07/18] KVM: arm64: Make kvm_pgtable_stage2_init() a static inline function
2024-11-04 13:31 [PATCH 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
` (5 preceding siblings ...)
2024-11-04 13:31 ` [PATCH 06/18] KVM: arm64: Pass walk flags to kvm_pgtable_stage2_relax_perms Quentin Perret
@ 2024-11-04 13:31 ` Quentin Perret
2024-11-04 13:31 ` [PATCH 08/18] KVM: arm64: Introduce pkvm_vcpu_{load,put}() Quentin Perret
` (10 subsequent siblings)
17 siblings, 0 replies; 24+ messages in thread
From: Quentin Perret @ 2024-11-04 13:31 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Catalin Marinas, Will Deacon
Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
kvmarm, linux-kernel
Turn kvm_pgtable_stage2_init() into a static inline function instead of
a macro. This will allow the usage of typeof() on it later on.
Signed-off-by: Quentin Perret <qperret@google.com>
---
arch/arm64/include/asm/kvm_pgtable.h | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index f52fa8158ce6..047e1c06ae4c 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -526,8 +526,11 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
enum kvm_pgtable_stage2_flags flags,
kvm_pgtable_force_pte_cb_t force_pte_cb);
-#define kvm_pgtable_stage2_init(pgt, mmu, mm_ops) \
- __kvm_pgtable_stage2_init(pgt, mmu, mm_ops, 0, NULL)
+static inline int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
+ struct kvm_pgtable_mm_ops *mm_ops)
+{
+ return __kvm_pgtable_stage2_init(pgt, mmu, mm_ops, 0, NULL);
+}
/**
* kvm_pgtable_stage2_destroy() - Destroy an unused guest stage-2 page-table.
--
2.47.0.163.g1226f6d8fa-goog
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 08/18] KVM: arm64: Introduce pkvm_vcpu_{load,put}()
2024-11-04 13:31 [PATCH 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
` (6 preceding siblings ...)
2024-11-04 13:31 ` [PATCH 07/18] KVM: arm64: Make kvm_pgtable_stage2_init() a static inline function Quentin Perret
@ 2024-11-04 13:31 ` Quentin Perret
2024-11-04 13:31 ` [PATCH 09/18] KVM: arm64: Introduce {get,put}_pkvm_hyp_vm() helpers Quentin Perret
` (9 subsequent siblings)
17 siblings, 0 replies; 24+ messages in thread
From: Quentin Perret @ 2024-11-04 13:31 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Catalin Marinas, Will Deacon
Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
kvmarm, linux-kernel
From: Marc Zyngier <maz@kernel.org>
Rather than look-up the hyp vCPU on every run hypercall at EL2,
introduce a per-CPU 'loaded_hyp_vcpu' tracking variable which is updated
by a pair of load/put hypercalls called directly from
kvm_arch_vcpu_{load,put}() when pKVM is enabled.
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
arch/arm64/include/asm/kvm_asm.h | 2 ++
arch/arm64/kvm/arm.c | 14 ++++++++
arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 7 ++++
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 47 ++++++++++++++++++++------
arch/arm64/kvm/hyp/nvhe/pkvm.c | 28 +++++++++++++++
arch/arm64/kvm/vgic/vgic-v3.c | 6 ++--
6 files changed, 92 insertions(+), 12 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 67afac659231..a1c6dbec1871 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -80,6 +80,8 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
__KVM_HOST_SMCCC_FUNC___pkvm_teardown_vm,
+ __KVM_HOST_SMCCC_FUNC___pkvm_vcpu_load,
+ __KVM_HOST_SMCCC_FUNC___pkvm_vcpu_put,
};
#define DECLARE_KVM_VHE_SYM(sym) extern char sym[]
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 48cafb65d6ac..2bf168b17a77 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -623,12 +623,26 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
kvm_arch_vcpu_load_debug_state_flags(vcpu);
+ if (is_protected_kvm_enabled()) {
+ kvm_call_hyp_nvhe(__pkvm_vcpu_load,
+ vcpu->kvm->arch.pkvm.handle,
+ vcpu->vcpu_idx, vcpu->arch.hcr_el2);
+ kvm_call_hyp(__vgic_v3_restore_vmcr_aprs,
+ &vcpu->arch.vgic_cpu.vgic_v3);
+ }
+
if (!cpumask_test_cpu(cpu, vcpu->kvm->arch.supported_cpus))
vcpu_set_on_unsupported_cpu(vcpu);
}
void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
{
+ if (is_protected_kvm_enabled()) {
+ kvm_call_hyp(__vgic_v3_save_vmcr_aprs,
+ &vcpu->arch.vgic_cpu.vgic_v3);
+ kvm_call_hyp_nvhe(__pkvm_vcpu_put);
+ }
+
kvm_arch_vcpu_put_debug_state_flags(vcpu);
kvm_arch_vcpu_put_fp(vcpu);
if (has_vhe())
diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index 24a9a8330d19..6940eb171a52 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -20,6 +20,12 @@ struct pkvm_hyp_vcpu {
/* Backpointer to the host's (untrusted) vCPU instance. */
struct kvm_vcpu *host_vcpu;
+
+ /*
+ * If this hyp vCPU is loaded, then this is a backpointer to the
+ * per-cpu pointer tracking us. Otherwise, NULL if not loaded.
+ */
+ struct pkvm_hyp_vcpu **loaded_hyp_vcpu;
};
/*
@@ -69,5 +75,6 @@ int __pkvm_teardown_vm(pkvm_handle_t handle);
struct pkvm_hyp_vcpu *pkvm_load_hyp_vcpu(pkvm_handle_t handle,
unsigned int vcpu_idx);
void pkvm_put_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu);
+struct pkvm_hyp_vcpu *pkvm_get_loaded_hyp_vcpu(void);
#endif /* __ARM64_KVM_NVHE_PKVM_H__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index fefc89209f9e..6bcdba4fdc76 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -139,16 +139,46 @@ static void sync_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
host_cpu_if->vgic_lr[i] = hyp_cpu_if->vgic_lr[i];
}
+static void handle___pkvm_vcpu_load(struct kvm_cpu_context *host_ctxt)
+{
+ DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
+ DECLARE_REG(unsigned int, vcpu_idx, host_ctxt, 2);
+ DECLARE_REG(u64, hcr_el2, host_ctxt, 3);
+ struct pkvm_hyp_vcpu *hyp_vcpu;
+
+ if (!is_protected_kvm_enabled())
+ return;
+
+ hyp_vcpu = pkvm_load_hyp_vcpu(handle, vcpu_idx);
+ if (!hyp_vcpu)
+ return;
+
+ if (pkvm_hyp_vcpu_is_protected(hyp_vcpu)) {
+ /* Propagate WFx trapping flags */
+ hyp_vcpu->vcpu.arch.hcr_el2 &= ~(HCR_TWE | HCR_TWI);
+ hyp_vcpu->vcpu.arch.hcr_el2 |= hcr_el2 & (HCR_TWE | HCR_TWI);
+ }
+}
+
+static void handle___pkvm_vcpu_put(struct kvm_cpu_context *host_ctxt)
+{
+ struct pkvm_hyp_vcpu *hyp_vcpu;
+
+ if (!is_protected_kvm_enabled())
+ return;
+
+ hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
+ if (hyp_vcpu)
+ pkvm_put_hyp_vcpu(hyp_vcpu);
+}
+
static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(struct kvm_vcpu *, host_vcpu, host_ctxt, 1);
int ret;
- host_vcpu = kern_hyp_va(host_vcpu);
-
if (unlikely(is_protected_kvm_enabled())) {
- struct pkvm_hyp_vcpu *hyp_vcpu;
- struct kvm *host_kvm;
+ struct pkvm_hyp_vcpu *hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
/*
* KVM (and pKVM) doesn't support SME guests for now, and
@@ -161,9 +191,6 @@ static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
goto out;
}
- host_kvm = kern_hyp_va(host_vcpu->kvm);
- hyp_vcpu = pkvm_load_hyp_vcpu(host_kvm->arch.pkvm.handle,
- host_vcpu->vcpu_idx);
if (!hyp_vcpu) {
ret = -EINVAL;
goto out;
@@ -174,12 +201,10 @@ static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
ret = __kvm_vcpu_run(&hyp_vcpu->vcpu);
sync_hyp_vcpu(hyp_vcpu);
- pkvm_put_hyp_vcpu(hyp_vcpu);
} else {
/* The host is fully trusted, run its vCPU directly. */
- ret = __kvm_vcpu_run(host_vcpu);
+ ret = __kvm_vcpu_run(kern_hyp_va(host_vcpu));
}
-
out:
cpu_reg(host_ctxt, 1) = ret;
}
@@ -415,6 +440,8 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__pkvm_init_vm),
HANDLE_FUNC(__pkvm_init_vcpu),
HANDLE_FUNC(__pkvm_teardown_vm),
+ HANDLE_FUNC(__pkvm_vcpu_load),
+ HANDLE_FUNC(__pkvm_vcpu_put),
};
static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 077d4098548d..9ed2b8a63371 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -20,6 +20,12 @@ unsigned int kvm_arm_vmid_bits;
unsigned int kvm_host_sve_max_vl;
+/*
+ * The currently loaded hyp vCPU for each physical CPU. Used only when
+ * protected KVM is enabled, but for both protected and non-protected VMs.
+ */
+static DEFINE_PER_CPU(struct pkvm_hyp_vcpu *, loaded_hyp_vcpu);
+
/*
* Set trap register values based on features in ID_AA64PFR0.
*/
@@ -268,15 +274,30 @@ struct pkvm_hyp_vcpu *pkvm_load_hyp_vcpu(pkvm_handle_t handle,
struct pkvm_hyp_vcpu *hyp_vcpu = NULL;
struct pkvm_hyp_vm *hyp_vm;
+ /* Cannot load a new vcpu without putting the old one first. */
+ if (__this_cpu_read(loaded_hyp_vcpu))
+ return NULL;
+
hyp_spin_lock(&vm_table_lock);
hyp_vm = get_vm_by_handle(handle);
if (!hyp_vm || hyp_vm->nr_vcpus <= vcpu_idx)
goto unlock;
hyp_vcpu = hyp_vm->vcpus[vcpu_idx];
+
+ /* Ensure vcpu isn't loaded on more than one cpu simultaneously. */
+ if (unlikely(hyp_vcpu->loaded_hyp_vcpu)) {
+ hyp_vcpu = NULL;
+ goto unlock;
+ }
+
+ hyp_vcpu->loaded_hyp_vcpu = this_cpu_ptr(&loaded_hyp_vcpu);
hyp_page_ref_inc(hyp_virt_to_page(hyp_vm));
unlock:
hyp_spin_unlock(&vm_table_lock);
+
+ if (hyp_vcpu)
+ __this_cpu_write(loaded_hyp_vcpu, hyp_vcpu);
return hyp_vcpu;
}
@@ -285,10 +306,17 @@ void pkvm_put_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
struct pkvm_hyp_vm *hyp_vm = pkvm_hyp_vcpu_to_hyp_vm(hyp_vcpu);
hyp_spin_lock(&vm_table_lock);
+ hyp_vcpu->loaded_hyp_vcpu = NULL;
+ __this_cpu_write(loaded_hyp_vcpu, NULL);
hyp_page_ref_dec(hyp_virt_to_page(hyp_vm));
hyp_spin_unlock(&vm_table_lock);
}
+struct pkvm_hyp_vcpu *pkvm_get_loaded_hyp_vcpu(void)
+{
+ return __this_cpu_read(loaded_hyp_vcpu);
+}
+
static void unpin_host_vcpu(struct kvm_vcpu *host_vcpu)
{
if (host_vcpu)
diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
index b217b256853c..e43a8bb3e6b0 100644
--- a/arch/arm64/kvm/vgic/vgic-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-v3.c
@@ -734,7 +734,8 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
{
struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
- kvm_call_hyp(__vgic_v3_restore_vmcr_aprs, cpu_if);
+ if (likely(!is_protected_kvm_enabled()))
+ kvm_call_hyp(__vgic_v3_restore_vmcr_aprs, cpu_if);
if (has_vhe())
__vgic_v3_activate_traps(cpu_if);
@@ -746,7 +747,8 @@ void vgic_v3_put(struct kvm_vcpu *vcpu)
{
struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
- kvm_call_hyp(__vgic_v3_save_vmcr_aprs, cpu_if);
+ if (likely(!is_protected_kvm_enabled()))
+ kvm_call_hyp(__vgic_v3_save_vmcr_aprs, cpu_if);
WARN_ON(vgic_v4_put(vcpu));
if (has_vhe())
--
2.47.0.163.g1226f6d8fa-goog
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 09/18] KVM: arm64: Introduce {get,put}_pkvm_hyp_vm() helpers
2024-11-04 13:31 [PATCH 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
` (7 preceding siblings ...)
2024-11-04 13:31 ` [PATCH 08/18] KVM: arm64: Introduce pkvm_vcpu_{load,put}() Quentin Perret
@ 2024-11-04 13:31 ` Quentin Perret
2024-11-04 13:31 ` [PATCH 10/18] KVM: arm64: Introduce __pkvm_host_share_guest() Quentin Perret
` (8 subsequent siblings)
17 siblings, 0 replies; 24+ messages in thread
From: Quentin Perret @ 2024-11-04 13:31 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Catalin Marinas, Will Deacon
Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
kvmarm, linux-kernel
In preparation for accessing pkvm_hyp_vm structures at EL2 in a context
where we can't always expect a vCPU to be loaded (e.g. MMU notifiers),
introduce get/put helpers to gettemporary references to hyp VMs from
any context.
Signed-off-by: Quentin Perret <qperret@google.com>
---
arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 3 +++
arch/arm64/kvm/hyp/nvhe/pkvm.c | 20 ++++++++++++++++++++
2 files changed, 23 insertions(+)
diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index 6940eb171a52..be52c5b15e21 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -77,4 +77,7 @@ struct pkvm_hyp_vcpu *pkvm_load_hyp_vcpu(pkvm_handle_t handle,
void pkvm_put_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu);
struct pkvm_hyp_vcpu *pkvm_get_loaded_hyp_vcpu(void);
+struct pkvm_hyp_vm *get_pkvm_hyp_vm(pkvm_handle_t handle);
+void put_pkvm_hyp_vm(struct pkvm_hyp_vm *hyp_vm);
+
#endif /* __ARM64_KVM_NVHE_PKVM_H__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 9ed2b8a63371..d242da1ec56a 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -317,6 +317,26 @@ struct pkvm_hyp_vcpu *pkvm_get_loaded_hyp_vcpu(void)
return __this_cpu_read(loaded_hyp_vcpu);
}
+struct pkvm_hyp_vm *get_pkvm_hyp_vm(pkvm_handle_t handle)
+{
+ struct pkvm_hyp_vm *hyp_vm;
+
+ hyp_spin_lock(&vm_table_lock);
+ hyp_vm = get_vm_by_handle(handle);
+ if (hyp_vm)
+ hyp_page_ref_inc(hyp_virt_to_page(hyp_vm));
+ hyp_spin_unlock(&vm_table_lock);
+
+ return hyp_vm;
+}
+
+void put_pkvm_hyp_vm(struct pkvm_hyp_vm *hyp_vm)
+{
+ hyp_spin_lock(&vm_table_lock);
+ hyp_page_ref_dec(hyp_virt_to_page(hyp_vm));
+ hyp_spin_unlock(&vm_table_lock);
+}
+
static void unpin_host_vcpu(struct kvm_vcpu *host_vcpu)
{
if (host_vcpu)
--
2.47.0.163.g1226f6d8fa-goog
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 10/18] KVM: arm64: Introduce __pkvm_host_share_guest()
2024-11-04 13:31 [PATCH 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
` (8 preceding siblings ...)
2024-11-04 13:31 ` [PATCH 09/18] KVM: arm64: Introduce {get,put}_pkvm_hyp_vm() helpers Quentin Perret
@ 2024-11-04 13:31 ` Quentin Perret
2024-11-04 13:31 ` [PATCH 11/18] KVM: arm64: Introduce __pkvm_host_unshare_guest() Quentin Perret
` (7 subsequent siblings)
17 siblings, 0 replies; 24+ messages in thread
From: Quentin Perret @ 2024-11-04 13:31 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Catalin Marinas, Will Deacon
Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
kvmarm, linux-kernel
In preparation for handling guest stage-2 mappings at EL2, introduce a
new pKVM hypercall allowing to share pages with non-protected guests.
Signed-off-by: Quentin Perret <qperret@google.com>
---
arch/arm64/include/asm/kvm_asm.h | 1 +
arch/arm64/include/asm/kvm_host.h | 3 +
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
arch/arm64/kvm/hyp/include/nvhe/memory.h | 2 +
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 34 +++++++++
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 70 +++++++++++++++++++
arch/arm64/kvm/hyp/nvhe/pkvm.c | 7 ++
7 files changed, 118 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index a1c6dbec1871..b69390108c5a 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -65,6 +65,7 @@ enum __kvm_host_smccc_func {
/* Hypercalls available after pKVM finalisation */
__KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp,
__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
+ __KVM_HOST_SMCCC_FUNC___pkvm_host_share_guest,
__KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
__KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
__KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index bf64fed9820e..4b02904ec7c0 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -762,6 +762,9 @@ struct kvm_vcpu_arch {
/* Cache some mmu pages needed inside spinlock regions */
struct kvm_mmu_memory_cache mmu_page_cache;
+ /* Pages to be donated to pkvm/EL2 if it runs out */
+ struct kvm_hyp_memcache pkvm_memcache;
+
/* Virtual SError ESR to restore when HCR_EL2.VSE is set */
u64 vsesr_el2;
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 25038ac705d8..a7976e50f556 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -39,6 +39,7 @@ int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages);
int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
+int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum kvm_pgtable_prot prot);
bool addr_is_memory(phys_addr_t phys);
int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index 08f3a0416d4c..457318215155 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -47,6 +47,8 @@ struct hyp_page {
/* Host (non-meta) state. Guarded by the host stage-2 lock. */
enum pkvm_page_state host_state : 8;
+
+ u32 host_share_guest_count;
};
extern u64 __hyp_vmemmap;
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 6bcdba4fdc76..32bdf6b27958 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -209,6 +209,39 @@ static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
cpu_reg(host_ctxt, 1) = ret;
}
+static int pkvm_refill_memcache(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+ struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
+
+ return refill_memcache(&hyp_vcpu->vcpu.arch.pkvm_memcache,
+ host_vcpu->arch.pkvm_memcache.nr_pages,
+ &host_vcpu->arch.pkvm_memcache);
+}
+
+static void handle___pkvm_host_share_guest(struct kvm_cpu_context *host_ctxt)
+{
+ DECLARE_REG(u64, pfn, host_ctxt, 1);
+ DECLARE_REG(u64, gfn, host_ctxt, 2);
+ DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 3);
+ struct pkvm_hyp_vcpu *hyp_vcpu;
+ int ret = -EINVAL;
+
+ if (!is_protected_kvm_enabled())
+ goto out;
+
+ hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
+ if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
+ goto out;
+
+ ret = pkvm_refill_memcache(hyp_vcpu);
+ if (ret)
+ goto out;
+
+ ret = __pkvm_host_share_guest(pfn, gfn, hyp_vcpu, prot);
+out:
+ cpu_reg(host_ctxt, 1) = ret;
+}
+
static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
@@ -425,6 +458,7 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__pkvm_host_share_hyp),
HANDLE_FUNC(__pkvm_host_unshare_hyp),
+ HANDLE_FUNC(__pkvm_host_share_guest),
HANDLE_FUNC(__kvm_adjust_pc),
HANDLE_FUNC(__kvm_vcpu_run),
HANDLE_FUNC(__kvm_flush_vm_context),
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 1595081c4f6b..a69d7212b64c 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -861,6 +861,27 @@ static int hyp_complete_donation(u64 addr,
return pkvm_create_mappings_locked(start, end, prot);
}
+static enum pkvm_page_state guest_get_page_state(kvm_pte_t pte, u64 addr)
+{
+ if (!kvm_pte_valid(pte))
+ return PKVM_NOPAGE;
+
+ return pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte));
+}
+
+static int __guest_check_page_state_range(struct pkvm_hyp_vcpu *vcpu, u64 addr,
+ u64 size, enum pkvm_page_state state)
+{
+ struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
+ struct check_walk_data d = {
+ .desired = state,
+ .get_page_state = guest_get_page_state,
+ };
+
+ hyp_assert_lock_held(&vm->lock);
+ return check_page_state_range(&vm->pgt, addr, size, &d);
+}
+
static int check_share(struct pkvm_mem_share *share)
{
const struct pkvm_mem_transition *tx = &share->tx;
@@ -1343,3 +1364,52 @@ int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages)
return ret;
}
+
+int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu,
+ enum kvm_pgtable_prot prot)
+{
+ struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
+ u64 phys = hyp_pfn_to_phys(pfn);
+ u64 ipa = hyp_pfn_to_phys(gfn);
+ struct hyp_page *page;
+ int ret;
+
+ if (prot & ~KVM_PGTABLE_PROT_RWX)
+ return -EINVAL;
+
+ ret = range_is_allowed_memory(phys, phys + PAGE_SIZE);
+ if (ret)
+ return ret;
+
+ host_lock_component();
+ guest_lock_component(vm);
+
+ ret = __guest_check_page_state_range(vcpu, ipa, PAGE_SIZE, PKVM_NOPAGE);
+ if (ret)
+ goto unlock;
+
+ page = hyp_phys_to_page(phys);
+ switch (page->host_state) {
+ case PKVM_PAGE_OWNED:
+ WARN_ON(__host_set_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_SHARED_OWNED));
+ break;
+ case PKVM_PAGE_SHARED_OWNED:
+ /* Only host to np-guest multi-sharing is tolerated */
+ WARN_ON(!page->host_share_guest_count);
+ break;
+ default:
+ ret = -EPERM;
+ goto unlock;
+ }
+
+ WARN_ON(kvm_pgtable_stage2_map(&vm->pgt, ipa, PAGE_SIZE, phys,
+ pkvm_mkstate(prot, PKVM_PAGE_SHARED_BORROWED),
+ &vcpu->vcpu.arch.pkvm_memcache, 0));
+ page->host_share_guest_count++;
+
+unlock:
+ guest_unlock_component(vm);
+ host_unlock_component();
+
+ return ret;
+}
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index d242da1ec56a..bdcfcc20cf66 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -680,6 +680,13 @@ int __pkvm_teardown_vm(pkvm_handle_t handle)
/* Push the metadata pages to the teardown memcache */
for (idx = 0; idx < hyp_vm->nr_vcpus; ++idx) {
struct pkvm_hyp_vcpu *hyp_vcpu = hyp_vm->vcpus[idx];
+ struct kvm_hyp_memcache *vcpu_mc = &hyp_vcpu->vcpu.arch.pkvm_memcache;
+
+ while (vcpu_mc->nr_pages) {
+ void *addr = pop_hyp_memcache(vcpu_mc, hyp_phys_to_virt);
+ push_hyp_memcache(mc, addr, hyp_virt_to_phys);
+ unmap_donated_memory_noclear(addr, PAGE_SIZE);
+ }
teardown_donated_memory(mc, hyp_vcpu, sizeof(*hyp_vcpu));
}
--
2.47.0.163.g1226f6d8fa-goog
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 11/18] KVM: arm64: Introduce __pkvm_host_unshare_guest()
2024-11-04 13:31 [PATCH 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
` (9 preceding siblings ...)
2024-11-04 13:31 ` [PATCH 10/18] KVM: arm64: Introduce __pkvm_host_share_guest() Quentin Perret
@ 2024-11-04 13:31 ` Quentin Perret
2024-11-04 13:31 ` [PATCH 12/18] KVM: arm64: Introduce __pkvm_host_relax_guest_perms() Quentin Perret
` (6 subsequent siblings)
17 siblings, 0 replies; 24+ messages in thread
From: Quentin Perret @ 2024-11-04 13:31 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Catalin Marinas, Will Deacon
Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
kvmarm, linux-kernel
In preparation for letting the host unmap pages from non-protected
guests, introduce a new hypercall implementing the host-unshare-guest
transition.
Signed-off-by: Quentin Perret <qperret@google.com>
---
arch/arm64/include/asm/kvm_asm.h | 1 +
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 5 ++
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 24 ++++++
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 78 +++++++++++++++++++
5 files changed, 109 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index b69390108c5a..e67efee936b6 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -66,6 +66,7 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp,
__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
__KVM_HOST_SMCCC_FUNC___pkvm_host_share_guest,
+ __KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_guest,
__KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
__KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
__KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index a7976e50f556..e528a42ed60e 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -40,6 +40,7 @@ int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum kvm_pgtable_prot prot);
+int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
bool addr_is_memory(phys_addr_t phys);
int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index be52c5b15e21..5dfc9ece9aa5 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -64,6 +64,11 @@ static inline bool pkvm_hyp_vcpu_is_protected(struct pkvm_hyp_vcpu *hyp_vcpu)
return vcpu_is_protected(&hyp_vcpu->vcpu);
}
+static inline bool pkvm_hyp_vm_is_protected(struct pkvm_hyp_vm *hyp_vm)
+{
+ return kvm_vm_is_protected(&hyp_vm->kvm);
+}
+
void pkvm_hyp_vm_table_init(void *tbl);
int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 32bdf6b27958..68bbef69d99a 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -242,6 +242,29 @@ static void handle___pkvm_host_share_guest(struct kvm_cpu_context *host_ctxt)
cpu_reg(host_ctxt, 1) = ret;
}
+static void handle___pkvm_host_unshare_guest(struct kvm_cpu_context *host_ctxt)
+{
+ DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
+ DECLARE_REG(u64, gfn, host_ctxt, 2);
+ struct pkvm_hyp_vm *hyp_vm;
+ int ret = -EINVAL;
+
+ if (!is_protected_kvm_enabled())
+ goto out;
+
+ hyp_vm = get_pkvm_hyp_vm(handle);
+ if (!hyp_vm)
+ goto out;
+ if (pkvm_hyp_vm_is_protected(hyp_vm))
+ goto put_hyp_vm;
+
+ ret = __pkvm_host_unshare_guest(gfn, hyp_vm);
+put_hyp_vm:
+ put_pkvm_hyp_vm(hyp_vm);
+out:
+ cpu_reg(host_ctxt, 1) = ret;
+}
+
static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
@@ -459,6 +482,7 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__pkvm_host_share_hyp),
HANDLE_FUNC(__pkvm_host_unshare_hyp),
HANDLE_FUNC(__pkvm_host_share_guest),
+ HANDLE_FUNC(__pkvm_host_unshare_guest),
HANDLE_FUNC(__kvm_adjust_pc),
HANDLE_FUNC(__kvm_vcpu_run),
HANDLE_FUNC(__kvm_flush_vm_context),
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index a69d7212b64c..f7476a29e1a9 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1413,3 +1413,81 @@ int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu,
return ret;
}
+
+static int guest_get_valid_pte(struct pkvm_hyp_vm *vm, u64 *phys, u64 ipa, u8 order, kvm_pte_t *pte)
+{
+ size_t size = PAGE_SIZE << order;
+ s8 level;
+
+ if (order && size != PMD_SIZE)
+ return -EINVAL;
+
+ WARN_ON(kvm_pgtable_get_leaf(&vm->pgt, ipa, pte, &level));
+
+ if (kvm_granule_size(level) != size)
+ return -E2BIG;
+
+ if (!kvm_pte_valid(*pte))
+ return -ENOENT;
+
+ *phys = kvm_pte_to_phys(*pte);
+
+ return 0;
+}
+
+static int __check_host_unshare_guest(struct pkvm_hyp_vm *vm, u64 *phys, u64 ipa)
+{
+ enum pkvm_page_state state;
+ struct hyp_page *page;
+ kvm_pte_t pte;
+ int ret;
+
+ ret = guest_get_valid_pte(vm, phys, ipa, 0, &pte);
+ if (ret)
+ return ret;
+
+ state = guest_get_page_state(pte, ipa);
+ if (state != PKVM_PAGE_SHARED_BORROWED)
+ return -EPERM;
+
+ ret = range_is_allowed_memory(*phys, *phys + PAGE_SIZE);
+ if (ret)
+ return ret;
+
+ page = hyp_phys_to_page(*phys);
+ if (page->host_state != PKVM_PAGE_SHARED_OWNED)
+ return -EPERM;
+ WARN_ON(!page->host_share_guest_count);
+
+ return 0;
+}
+
+int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm)
+{
+ u64 ipa = hyp_pfn_to_phys(gfn);
+ struct hyp_page *page;
+ u64 phys;
+ int ret;
+
+ host_lock_component();
+ guest_lock_component(hyp_vm);
+
+ ret = __check_host_unshare_guest(hyp_vm, &phys, ipa);
+ if (ret)
+ goto unlock;
+
+ ret = kvm_pgtable_stage2_unmap(&hyp_vm->pgt, ipa, PAGE_SIZE);
+ if (ret)
+ goto unlock;
+
+ page = hyp_phys_to_page(phys);
+ page->host_share_guest_count--;
+ if (!page->host_share_guest_count)
+ WARN_ON(__host_set_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_OWNED));
+
+unlock:
+ guest_unlock_component(hyp_vm);
+ host_unlock_component();
+
+ return ret;
+}
--
2.47.0.163.g1226f6d8fa-goog
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 12/18] KVM: arm64: Introduce __pkvm_host_relax_guest_perms()
2024-11-04 13:31 [PATCH 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
` (10 preceding siblings ...)
2024-11-04 13:31 ` [PATCH 11/18] KVM: arm64: Introduce __pkvm_host_unshare_guest() Quentin Perret
@ 2024-11-04 13:31 ` Quentin Perret
2024-11-04 13:31 ` [PATCH 13/18] KVM: arm64: Introduce __pkvm_host_wrprotect_guest() Quentin Perret
` (5 subsequent siblings)
17 siblings, 0 replies; 24+ messages in thread
From: Quentin Perret @ 2024-11-04 13:31 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Catalin Marinas, Will Deacon
Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
kvmarm, linux-kernel
Introduce a new hypercall allowing the host to relax the stage-2
permissions of mappings in a non-protected guest page-table. It will be
used later once we start allowing RO memslots and dirty logging.
Signed-off-by: Quentin Perret <qperret@google.com>
---
arch/arm64/include/asm/kvm_asm.h | 1 +
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 20 +++++++++++++++
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 25 +++++++++++++++++++
4 files changed, 47 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index e67efee936b6..f528656e8359 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -67,6 +67,7 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
__KVM_HOST_SMCCC_FUNC___pkvm_host_share_guest,
__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_guest,
+ __KVM_HOST_SMCCC_FUNC___pkvm_host_relax_guest_perms,
__KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
__KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
__KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index e528a42ed60e..db0dd83c2457 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -41,6 +41,7 @@ int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum kvm_pgtable_prot prot);
int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
+int __pkvm_host_relax_guest_perms(u64 gfn, enum kvm_pgtable_prot prot, struct pkvm_hyp_vcpu *vcpu);
bool addr_is_memory(phys_addr_t phys);
int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 68bbef69d99a..d3210719e247 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -265,6 +265,25 @@ static void handle___pkvm_host_unshare_guest(struct kvm_cpu_context *host_ctxt)
cpu_reg(host_ctxt, 1) = ret;
}
+static void handle___pkvm_host_relax_guest_perms(struct kvm_cpu_context *host_ctxt)
+{
+ DECLARE_REG(u64, gfn, host_ctxt, 1);
+ DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 2);
+ struct pkvm_hyp_vcpu *hyp_vcpu;
+ int ret = -EINVAL;
+
+ if (!is_protected_kvm_enabled())
+ goto out;
+
+ hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
+ if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
+ goto out;
+
+ ret = __pkvm_host_relax_guest_perms(gfn, prot, hyp_vcpu);
+out:
+ cpu_reg(host_ctxt, 1) = ret;
+}
+
static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
@@ -483,6 +502,7 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__pkvm_host_unshare_hyp),
HANDLE_FUNC(__pkvm_host_share_guest),
HANDLE_FUNC(__pkvm_host_unshare_guest),
+ HANDLE_FUNC(__pkvm_host_relax_guest_perms),
HANDLE_FUNC(__kvm_adjust_pc),
HANDLE_FUNC(__kvm_vcpu_run),
HANDLE_FUNC(__kvm_flush_vm_context),
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index f7476a29e1a9..fc6050dcf904 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1491,3 +1491,28 @@ int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm)
return ret;
}
+
+int __pkvm_host_relax_guest_perms(u64 gfn, enum kvm_pgtable_prot prot, struct pkvm_hyp_vcpu *vcpu)
+{
+ struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
+ u64 ipa = hyp_pfn_to_phys(gfn);
+ u64 phys;
+ int ret;
+
+ if ((prot & KVM_PGTABLE_PROT_RWX) != prot)
+ return -EPERM;
+
+ host_lock_component();
+ guest_lock_component(vm);
+
+ ret = __check_host_unshare_guest(vm, &phys, ipa);
+ if (ret)
+ goto unlock;
+
+ ret = kvm_pgtable_stage2_relax_perms(&vm->pgt, ipa, prot, 0);
+unlock:
+ guest_unlock_component(vm);
+ host_unlock_component();
+
+ return ret;
+}
--
2.47.0.163.g1226f6d8fa-goog
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 13/18] KVM: arm64: Introduce __pkvm_host_wrprotect_guest()
2024-11-04 13:31 [PATCH 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
` (11 preceding siblings ...)
2024-11-04 13:31 ` [PATCH 12/18] KVM: arm64: Introduce __pkvm_host_relax_guest_perms() Quentin Perret
@ 2024-11-04 13:31 ` Quentin Perret
2024-11-04 13:32 ` [PATCH 14/18] KVM: arm64: Introduce __pkvm_host_test_clear_young_guest() Quentin Perret
` (4 subsequent siblings)
17 siblings, 0 replies; 24+ messages in thread
From: Quentin Perret @ 2024-11-04 13:31 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Catalin Marinas, Will Deacon
Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
kvmarm, linux-kernel
Introduce a new hypercall to remove the write permission from a
non-protected guest stage-2 mapping. This will be used for e.g. enabling
dirty logging.
Signed-off-by: Quentin Perret <qperret@google.com>
---
arch/arm64/include/asm/kvm_asm.h | 1 +
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 24 +++++++++++++++++++
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 21 ++++++++++++++++
4 files changed, 47 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index f528656e8359..3f1f0760c375 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -68,6 +68,7 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___pkvm_host_share_guest,
__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_guest,
__KVM_HOST_SMCCC_FUNC___pkvm_host_relax_guest_perms,
+ __KVM_HOST_SMCCC_FUNC___pkvm_host_wrprotect_guest,
__KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
__KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
__KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index db0dd83c2457..8658b5932473 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -42,6 +42,7 @@ int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum kvm_pgtable_prot prot);
int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
int __pkvm_host_relax_guest_perms(u64 gfn, enum kvm_pgtable_prot prot, struct pkvm_hyp_vcpu *vcpu);
+int __pkvm_host_wrprotect_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
bool addr_is_memory(phys_addr_t phys);
int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index d3210719e247..ce33079072c0 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -284,6 +284,29 @@ static void handle___pkvm_host_relax_guest_perms(struct kvm_cpu_context *host_ct
cpu_reg(host_ctxt, 1) = ret;
}
+static void handle___pkvm_host_wrprotect_guest(struct kvm_cpu_context *host_ctxt)
+{
+ DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
+ DECLARE_REG(u64, gfn, host_ctxt, 2);
+ struct pkvm_hyp_vm *hyp_vm;
+ int ret = -EINVAL;
+
+ if (!is_protected_kvm_enabled())
+ goto out;
+
+ hyp_vm = get_pkvm_hyp_vm(handle);
+ if (!hyp_vm)
+ goto out;
+ if (pkvm_hyp_vm_is_protected(hyp_vm))
+ goto put_hyp_vm;
+
+ ret = __pkvm_host_wrprotect_guest(gfn, hyp_vm);
+put_hyp_vm:
+ put_pkvm_hyp_vm(hyp_vm);
+out:
+ cpu_reg(host_ctxt, 1) = ret;
+}
+
static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
@@ -503,6 +526,7 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__pkvm_host_share_guest),
HANDLE_FUNC(__pkvm_host_unshare_guest),
HANDLE_FUNC(__pkvm_host_relax_guest_perms),
+ HANDLE_FUNC(__pkvm_host_wrprotect_guest),
HANDLE_FUNC(__kvm_adjust_pc),
HANDLE_FUNC(__kvm_vcpu_run),
HANDLE_FUNC(__kvm_flush_vm_context),
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index fc6050dcf904..3a8751175fd5 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1516,3 +1516,24 @@ int __pkvm_host_relax_guest_perms(u64 gfn, enum kvm_pgtable_prot prot, struct pk
return ret;
}
+
+int __pkvm_host_wrprotect_guest(u64 gfn, struct pkvm_hyp_vm *vm)
+{
+ u64 ipa = hyp_pfn_to_phys(gfn);
+ u64 phys;
+ int ret;
+
+ host_lock_component();
+ guest_lock_component(vm);
+
+ ret = __check_host_unshare_guest(vm, &phys, ipa);
+ if (ret)
+ goto unlock;
+
+ ret = kvm_pgtable_stage2_wrprotect(&vm->pgt, ipa, PAGE_SIZE);
+unlock:
+ guest_unlock_component(vm);
+ host_unlock_component();
+
+ return ret;
+}
--
2.47.0.163.g1226f6d8fa-goog
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 14/18] KVM: arm64: Introduce __pkvm_host_test_clear_young_guest()
2024-11-04 13:31 [PATCH 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
` (12 preceding siblings ...)
2024-11-04 13:31 ` [PATCH 13/18] KVM: arm64: Introduce __pkvm_host_wrprotect_guest() Quentin Perret
@ 2024-11-04 13:32 ` Quentin Perret
2024-11-04 13:32 ` [PATCH 15/18] KVM: arm64: Introduce __pkvm_host_mkyoung_guest() Quentin Perret
` (3 subsequent siblings)
17 siblings, 0 replies; 24+ messages in thread
From: Quentin Perret @ 2024-11-04 13:32 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Catalin Marinas, Will Deacon
Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
kvmarm, linux-kernel
Plumb the kvm_stage2_test_clear_young() callback into pKVM for
non-protected guest. It will be later be called from MMU notifiers.
Signed-off-by: Quentin Perret <qperret@google.com>
---
arch/arm64/include/asm/kvm_asm.h | 1 +
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 25 +++++++++++++++++++
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 21 ++++++++++++++++
4 files changed, 48 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 3f1f0760c375..acb36762e15f 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -69,6 +69,7 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_guest,
__KVM_HOST_SMCCC_FUNC___pkvm_host_relax_guest_perms,
__KVM_HOST_SMCCC_FUNC___pkvm_host_wrprotect_guest,
+ __KVM_HOST_SMCCC_FUNC___pkvm_host_test_clear_young_guest,
__KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
__KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
__KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 8658b5932473..554ce31882e6 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -43,6 +43,7 @@ int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum k
int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
int __pkvm_host_relax_guest_perms(u64 gfn, enum kvm_pgtable_prot prot, struct pkvm_hyp_vcpu *vcpu);
int __pkvm_host_wrprotect_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
+int __pkvm_host_test_clear_young_guest(u64 gfn, bool mkold, struct pkvm_hyp_vm *vm);
bool addr_is_memory(phys_addr_t phys);
int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index ce33079072c0..21c8a5e74d14 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -307,6 +307,30 @@ static void handle___pkvm_host_wrprotect_guest(struct kvm_cpu_context *host_ctxt
cpu_reg(host_ctxt, 1) = ret;
}
+static void handle___pkvm_host_test_clear_young_guest(struct kvm_cpu_context *host_ctxt)
+{
+ DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
+ DECLARE_REG(u64, gfn, host_ctxt, 2);
+ DECLARE_REG(bool, mkold, host_ctxt, 3);
+ struct pkvm_hyp_vm *hyp_vm;
+ int ret = -EINVAL;
+
+ if (!is_protected_kvm_enabled())
+ goto out;
+
+ hyp_vm = get_pkvm_hyp_vm(handle);
+ if (!hyp_vm)
+ goto out;
+ if (pkvm_hyp_vm_is_protected(hyp_vm))
+ goto put_hyp_vm;
+
+ ret = __pkvm_host_test_clear_young_guest(gfn, mkold, hyp_vm);
+put_hyp_vm:
+ put_pkvm_hyp_vm(hyp_vm);
+out:
+ cpu_reg(host_ctxt, 1) = ret;
+}
+
static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
@@ -527,6 +551,7 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__pkvm_host_unshare_guest),
HANDLE_FUNC(__pkvm_host_relax_guest_perms),
HANDLE_FUNC(__pkvm_host_wrprotect_guest),
+ HANDLE_FUNC(__pkvm_host_test_clear_young_guest),
HANDLE_FUNC(__kvm_adjust_pc),
HANDLE_FUNC(__kvm_vcpu_run),
HANDLE_FUNC(__kvm_flush_vm_context),
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 3a8751175fd5..7c2aca459deb 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1537,3 +1537,24 @@ int __pkvm_host_wrprotect_guest(u64 gfn, struct pkvm_hyp_vm *vm)
return ret;
}
+
+int __pkvm_host_test_clear_young_guest(u64 gfn, bool mkold, struct pkvm_hyp_vm *vm)
+{
+ u64 ipa = hyp_pfn_to_phys(gfn);
+ u64 phys;
+ int ret;
+
+ host_lock_component();
+ guest_lock_component(vm);
+
+ ret = __check_host_unshare_guest(vm, &phys, ipa);
+ if (ret)
+ goto unlock;
+
+ ret = kvm_pgtable_stage2_test_clear_young(&vm->pgt, ipa, PAGE_SIZE, mkold);
+unlock:
+ guest_unlock_component(vm);
+ host_unlock_component();
+
+ return ret;
+}
--
2.47.0.163.g1226f6d8fa-goog
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 15/18] KVM: arm64: Introduce __pkvm_host_mkyoung_guest()
2024-11-04 13:31 [PATCH 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
` (13 preceding siblings ...)
2024-11-04 13:32 ` [PATCH 14/18] KVM: arm64: Introduce __pkvm_host_test_clear_young_guest() Quentin Perret
@ 2024-11-04 13:32 ` Quentin Perret
2024-11-04 13:32 ` [PATCH 16/18] KVM: arm64: Introduce __pkvm_tlb_flush_vmid() Quentin Perret
` (2 subsequent siblings)
17 siblings, 0 replies; 24+ messages in thread
From: Quentin Perret @ 2024-11-04 13:32 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Catalin Marinas, Will Deacon
Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
kvmarm, linux-kernel
Plumb the kvm_pgtable_stage2_mkyoung() callback into pKVM for
non-protected guests. It will be called later from the fault handling
path.
Signed-off-by: Quentin Perret <qperret@google.com>
---
arch/arm64/include/asm/kvm_asm.h | 1 +
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 19 +++++++++++++++
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 24 +++++++++++++++++++
4 files changed, 45 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index acb36762e15f..4b93fb3a9a96 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -70,6 +70,7 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___pkvm_host_relax_guest_perms,
__KVM_HOST_SMCCC_FUNC___pkvm_host_wrprotect_guest,
__KVM_HOST_SMCCC_FUNC___pkvm_host_test_clear_young_guest,
+ __KVM_HOST_SMCCC_FUNC___pkvm_host_mkyoung_guest,
__KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
__KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
__KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 554ce31882e6..6ec64f1fee3e 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -44,6 +44,7 @@ int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
int __pkvm_host_relax_guest_perms(u64 gfn, enum kvm_pgtable_prot prot, struct pkvm_hyp_vcpu *vcpu);
int __pkvm_host_wrprotect_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
int __pkvm_host_test_clear_young_guest(u64 gfn, bool mkold, struct pkvm_hyp_vm *vm);
+kvm_pte_t __pkvm_host_mkyoung_guest(u64 gfn, struct pkvm_hyp_vcpu *vcpu);
bool addr_is_memory(phys_addr_t phys);
int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 21c8a5e74d14..904f6b1edced 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -331,6 +331,24 @@ static void handle___pkvm_host_test_clear_young_guest(struct kvm_cpu_context *ho
cpu_reg(host_ctxt, 1) = ret;
}
+static void handle___pkvm_host_mkyoung_guest(struct kvm_cpu_context *host_ctxt)
+{
+ DECLARE_REG(u64, gfn, host_ctxt, 1);
+ struct pkvm_hyp_vcpu *hyp_vcpu;
+ kvm_pte_t ret = 0;
+
+ if (!is_protected_kvm_enabled())
+ goto out;
+
+ hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
+ if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
+ goto out;
+
+ ret = __pkvm_host_mkyoung_guest(gfn, hyp_vcpu);
+out:
+ cpu_reg(host_ctxt, 1) = ret;
+}
+
static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
@@ -552,6 +570,7 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__pkvm_host_relax_guest_perms),
HANDLE_FUNC(__pkvm_host_wrprotect_guest),
HANDLE_FUNC(__pkvm_host_test_clear_young_guest),
+ HANDLE_FUNC(__pkvm_host_mkyoung_guest),
HANDLE_FUNC(__kvm_adjust_pc),
HANDLE_FUNC(__kvm_vcpu_run),
HANDLE_FUNC(__kvm_flush_vm_context),
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 7c2aca459deb..a6a47383135b 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1558,3 +1558,27 @@ int __pkvm_host_test_clear_young_guest(u64 gfn, bool mkold, struct pkvm_hyp_vm *
return ret;
}
+
+kvm_pte_t __pkvm_host_mkyoung_guest(u64 gfn, struct pkvm_hyp_vcpu *vcpu)
+{
+ struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
+ u64 ipa = hyp_pfn_to_phys(gfn);
+ kvm_pte_t pte = 0;
+ u64 phys;
+ int ret;
+
+ host_lock_component();
+ guest_lock_component(vm);
+
+ ret = __check_host_unshare_guest(vm, &phys, ipa);
+ if (ret)
+ goto unlock;
+
+ pte = kvm_pgtable_stage2_mkyoung(&vm->pgt, ipa, 0);
+unlock:
+ guest_unlock_component(vm);
+ host_unlock_component();
+
+ return pte;
+
+}
--
2.47.0.163.g1226f6d8fa-goog
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 16/18] KVM: arm64: Introduce __pkvm_tlb_flush_vmid()
2024-11-04 13:31 [PATCH 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
` (14 preceding siblings ...)
2024-11-04 13:32 ` [PATCH 15/18] KVM: arm64: Introduce __pkvm_host_mkyoung_guest() Quentin Perret
@ 2024-11-04 13:32 ` Quentin Perret
2024-11-04 13:32 ` [PATCH 17/18] KVM: arm64: Introduce the EL1 pKVM MMU Quentin Perret
2024-11-04 13:32 ` [PATCH 18/18] KVM: arm64: Plumb the pKVM MMU in KVM Quentin Perret
17 siblings, 0 replies; 24+ messages in thread
From: Quentin Perret @ 2024-11-04 13:32 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Catalin Marinas, Will Deacon
Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
kvmarm, linux-kernel
Introduce a new hypercall to flush the TLBs of non-protected guests. The
host kernel will be responsible for issuing this hypercall after changing
stage-2 permissions using the __pkvm_host_relax_guest_perms() or
__pkvm_host_wrprotect_guest() paths. This is left under the host's
responsibility for performance reasons.
Note however that the TLB maintenance for all *unmap* operations still
remains entirely under the hypervisor's responsibility for security
reasons -- an unmapped page may be donated to another entity, so a stale
TLB entry could be used to leak private data.
Signed-off-by: Quentin Perret <qperret@google.com>
---
arch/arm64/include/asm/kvm_asm.h | 1 +
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 17 +++++++++++++++++
2 files changed, 18 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 4b93fb3a9a96..1bf7bc51f50f 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -88,6 +88,7 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___pkvm_teardown_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_load,
__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_put,
+ __KVM_HOST_SMCCC_FUNC___pkvm_tlb_flush_vmid,
};
#define DECLARE_KVM_VHE_SYM(sym) extern char sym[]
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 904f6b1edced..1d8baa14ff1c 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -396,6 +396,22 @@ static void handle___kvm_tlb_flush_vmid(struct kvm_cpu_context *host_ctxt)
__kvm_tlb_flush_vmid(kern_hyp_va(mmu));
}
+static void handle___pkvm_tlb_flush_vmid(struct kvm_cpu_context *host_ctxt)
+{
+ DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
+ struct pkvm_hyp_vm *hyp_vm;
+
+ if (!is_protected_kvm_enabled())
+ return;
+
+ hyp_vm = get_pkvm_hyp_vm(handle);
+ if (!hyp_vm)
+ return;
+
+ __kvm_tlb_flush_vmid(&hyp_vm->kvm.arch.mmu);
+ put_pkvm_hyp_vm(hyp_vm);
+}
+
static void handle___kvm_flush_cpu_context(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(struct kvm_s2_mmu *, mmu, host_ctxt, 1);
@@ -588,6 +604,7 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__pkvm_teardown_vm),
HANDLE_FUNC(__pkvm_vcpu_load),
HANDLE_FUNC(__pkvm_vcpu_put),
+ HANDLE_FUNC(__pkvm_tlb_flush_vmid),
};
static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
--
2.47.0.163.g1226f6d8fa-goog
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 17/18] KVM: arm64: Introduce the EL1 pKVM MMU
2024-11-04 13:31 [PATCH 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
` (15 preceding siblings ...)
2024-11-04 13:32 ` [PATCH 16/18] KVM: arm64: Introduce __pkvm_tlb_flush_vmid() Quentin Perret
@ 2024-11-04 13:32 ` Quentin Perret
2024-11-06 16:58 ` Quentin Perret
2024-11-04 13:32 ` [PATCH 18/18] KVM: arm64: Plumb the pKVM MMU in KVM Quentin Perret
17 siblings, 1 reply; 24+ messages in thread
From: Quentin Perret @ 2024-11-04 13:32 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Catalin Marinas, Will Deacon
Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
kvmarm, linux-kernel
Introduce a set of helper functions allowing to manipulate the pKVM
guest stage-2 page-tables from EL1 using pKVM's HVC interface.
Each helper has an exact one-to-one correspondance with the traditional
kvm_pgtable_stage2_*() functions from pgtable.c, with a strictly
matching prototype. This will ease plumbing later on in mmu.c.
These callbacks track the gfn->pfn mappings in a simple rb_tree indexed
by IPA in lieu of a page-table. This rb-tree is kept in sync with pKVM's
state and is protected by a new rwlock -- the existing mmu_lock
protection does not suffice in the map() path where the tree must be
modified while user_mem_abort() only acquires a read_lock.
Signed-off-by: Quentin Perret <qperret@google.com>
---
The embedded union inside struct kvm_pgtable is arguably a bit horrible
currently... I considered making the pgt argument to all kvm_pgtable_*()
functions an opaque void * ptr, and moving the definition of
struct kvm_pgtable to pgtable.c and the pkvm version into pkvm.c. Given
that the allocation of that data-structure is done by the caller, that
means we'd need to expose kvm_pgtable_get_pgd_size() or something that
each MMU (pgtable.c and pkvm.c) would have to implement and things like
that. But that felt like a bigger surgery, so I went with the simpler
option. Thoughts welcome :-)
Similarly, happy to drop the mappings_lock if we want to teach
user_mem_abort() about taking a write lock on the mmu_lock in the pKVM
case, but again this implementation is the least invasive into normal
KVM so that felt like a reasonable starting point.
---
arch/arm64/include/asm/kvm_host.h | 1 +
arch/arm64/include/asm/kvm_pgtable.h | 27 ++--
arch/arm64/include/asm/kvm_pkvm.h | 28 ++++
arch/arm64/kvm/pkvm.c | 194 +++++++++++++++++++++++++++
4 files changed, 241 insertions(+), 9 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 4b02904ec7c0..2bfb5983f6f1 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -87,6 +87,7 @@ void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu);
struct kvm_hyp_memcache {
phys_addr_t head;
unsigned long nr_pages;
+ struct pkvm_mapping *mapping; /* only used from EL1 */
};
static inline void push_hyp_memcache(struct kvm_hyp_memcache *mc,
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 047e1c06ae4c..9447193ee630 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -412,15 +412,24 @@ static inline bool kvm_pgtable_walk_lock_held(void)
* be used instead of block mappings.
*/
struct kvm_pgtable {
- u32 ia_bits;
- s8 start_level;
- kvm_pteref_t pgd;
- struct kvm_pgtable_mm_ops *mm_ops;
-
- /* Stage-2 only */
- struct kvm_s2_mmu *mmu;
- enum kvm_pgtable_stage2_flags flags;
- kvm_pgtable_force_pte_cb_t force_pte_cb;
+ union {
+ struct {
+ u32 ia_bits;
+ s8 start_level;
+ kvm_pteref_t pgd;
+ struct kvm_pgtable_mm_ops *mm_ops;
+
+ /* Stage-2 only */
+ struct kvm_s2_mmu *mmu;
+ enum kvm_pgtable_stage2_flags flags;
+ kvm_pgtable_force_pte_cb_t force_pte_cb;
+ };
+ struct {
+ struct kvm *kvm;
+ struct rb_root mappings;
+ rwlock_t mappings_lock;
+ } pkvm;
+ };
};
/**
diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index cd56acd9a842..f3eed6a5fa57 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -11,6 +11,12 @@
#include <linux/scatterlist.h>
#include <asm/kvm_pgtable.h>
+struct pkvm_mapping {
+ u64 gfn;
+ u64 pfn;
+ struct rb_node node;
+};
+
/* Maximum number of VMs that can co-exist under pKVM. */
#define KVM_MAX_PVMS 255
@@ -137,4 +143,26 @@ static inline size_t pkvm_host_sve_state_size(void)
SVE_SIG_REGS_SIZE(sve_vq_from_vl(kvm_host_sve_max_vl)));
}
+static inline pkvm_handle_t pkvm_pgt_to_handle(struct kvm_pgtable *pgt)
+{
+ return pgt->pkvm.kvm->arch.pkvm.handle;
+}
+
+int pkvm_pgtable_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu, struct kvm_pgtable_mm_ops *mm_ops);
+void pkvm_pgtable_destroy(struct kvm_pgtable *pgt);
+int pkvm_pgtable_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
+ u64 phys, enum kvm_pgtable_prot prot,
+ void *mc, enum kvm_pgtable_walk_flags flags);
+int pkvm_pgtable_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
+int pkvm_pgtable_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size);
+int pkvm_pgtable_flush(struct kvm_pgtable *pgt, u64 addr, u64 size);
+bool pkvm_pgtable_test_clear_young(struct kvm_pgtable *pgt, u64 addr, u64 size, bool mkold);
+int pkvm_pgtable_relax_perms(struct kvm_pgtable *pgt, u64 addr, enum kvm_pgtable_prot prot,
+ enum kvm_pgtable_walk_flags flags);
+kvm_pte_t pkvm_pgtable_mkyoung(struct kvm_pgtable *pgt, u64 addr, enum kvm_pgtable_walk_flags flags);
+int pkvm_pgtable_split(struct kvm_pgtable *pgt, u64 addr, u64 size, struct kvm_mmu_memory_cache *mc);
+void pkvm_pgtable_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, s8 level);
+kvm_pte_t *pkvm_pgtable_create_unlinked(struct kvm_pgtable *pgt, u64 phys, s8 level,
+ enum kvm_pgtable_prot prot, void *mc, bool force_pte);
+
#endif /* __ARM64_KVM_PKVM_H__ */
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 85117ea8f351..6d04a1a0fc6b 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -7,6 +7,7 @@
#include <linux/init.h>
#include <linux/kmemleak.h>
#include <linux/kvm_host.h>
+#include <asm/kvm_mmu.h>
#include <linux/memblock.h>
#include <linux/mutex.h>
#include <linux/sort.h>
@@ -268,3 +269,196 @@ static int __init finalize_pkvm(void)
return ret;
}
device_initcall_sync(finalize_pkvm);
+
+static int cmp_mappings(struct rb_node *node, const struct rb_node *parent)
+{
+ struct pkvm_mapping *a = rb_entry(node, struct pkvm_mapping, node);
+ struct pkvm_mapping *b = rb_entry(parent, struct pkvm_mapping, node);
+
+ if (a->gfn < b->gfn)
+ return -1;
+ if (a->gfn > b->gfn)
+ return 1;
+ return 0;
+}
+
+static struct rb_node *find_first_mapping_node(struct rb_root *root, u64 gfn)
+{
+ struct rb_node *node = root->rb_node, *prev = NULL;
+ struct pkvm_mapping *mapping;
+
+ while (node) {
+ mapping = rb_entry(node, struct pkvm_mapping, node);
+ if (mapping->gfn == gfn)
+ return node;
+ prev = node;
+ node = (gfn < mapping->gfn) ? node->rb_left : node->rb_right;
+ }
+
+ return prev;
+}
+
+#define for_each_mapping_in_range(pgt, start_ipa, end_ipa, mapping, tmp) \
+ for (tmp = find_first_mapping_node(&pgt->pkvm.mappings, ((start_ipa) >> PAGE_SHIFT)); \
+ tmp && ({ mapping = rb_entry(tmp, struct pkvm_mapping, node); tmp = rb_next(tmp); 1; });) \
+ if (mapping->gfn < ((start_ipa) >> PAGE_SHIFT)) \
+ continue; \
+ else if (mapping->gfn >= ((end_ipa) >> PAGE_SHIFT)) \
+ break; \
+ else
+
+int pkvm_pgtable_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu, struct kvm_pgtable_mm_ops *mm_ops)
+{
+ pgt->pkvm.kvm = kvm_s2_mmu_to_kvm(mmu);
+ pgt->pkvm.mappings = RB_ROOT;
+ rwlock_init(&pgt->pkvm.mappings_lock);
+
+ return 0;
+}
+
+void pkvm_pgtable_destroy(struct kvm_pgtable *pgt)
+{
+ pkvm_handle_t handle = pkvm_pgt_to_handle(pgt);
+ struct pkvm_mapping *mapping;
+ struct rb_node *node;
+
+ if (!handle)
+ return;
+
+ node = rb_first(&pgt->pkvm.mappings);
+ while (node) {
+ mapping = rb_entry(node, struct pkvm_mapping, node);
+ kvm_call_hyp_nvhe(__pkvm_host_unshare_guest, handle, mapping->gfn);
+ node = rb_next(node);
+ rb_erase(&mapping->node, &pgt->pkvm.mappings);
+ kfree(mapping);
+ }
+}
+
+int pkvm_pgtable_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
+ u64 phys, enum kvm_pgtable_prot prot,
+ void *mc, enum kvm_pgtable_walk_flags flags)
+{
+ struct pkvm_mapping *mapping = NULL;
+ struct kvm_hyp_memcache *cache = mc;
+ u64 gfn = addr >> PAGE_SHIFT;
+ u64 pfn = phys >> PAGE_SHIFT;
+ int ret;
+
+ if (size != PAGE_SIZE)
+ return -EINVAL;
+
+ write_lock(&pgt->pkvm.mappings_lock);
+ ret = kvm_call_hyp_nvhe(__pkvm_host_share_guest, pfn, gfn, prot);
+ if (ret) {
+ /* Is the gfn already mapped due to a racing vCPU? */
+ if (ret == -EPERM)
+ ret = -EAGAIN;
+ goto unlock;
+ }
+
+ swap(mapping, cache->mapping);
+ mapping->gfn = gfn;
+ mapping->pfn = pfn;
+ WARN_ON(rb_find_add(&mapping->node, &pgt->pkvm.mappings, cmp_mappings));
+unlock:
+ write_unlock(&pgt->pkvm.mappings_lock);
+
+ return ret;
+}
+
+int pkvm_pgtable_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
+{
+ pkvm_handle_t handle = pkvm_pgt_to_handle(pgt);
+ struct pkvm_mapping *mapping;
+ struct rb_node *tmp;
+ int ret = 0;
+
+ write_lock(&pgt->pkvm.mappings_lock);
+ for_each_mapping_in_range(pgt, addr, addr + size, mapping, tmp) {
+ ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_guest, handle, mapping->gfn);
+ if (WARN_ON(ret))
+ break;
+
+ rb_erase(&mapping->node, &pgt->pkvm.mappings);
+ kfree(mapping);
+ }
+ write_unlock(&pgt->pkvm.mappings_lock);
+
+ return ret;
+}
+
+int pkvm_pgtable_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
+{
+ pkvm_handle_t handle = pkvm_pgt_to_handle(pgt);
+ struct pkvm_mapping *mapping;
+ struct rb_node *tmp;
+ int ret = 0;
+
+ read_lock(&pgt->pkvm.mappings_lock);
+ for_each_mapping_in_range(pgt, addr, addr + size, mapping, tmp) {
+ ret = kvm_call_hyp_nvhe(__pkvm_host_wrprotect_guest, handle, mapping->gfn);
+ if (WARN_ON(ret))
+ break;
+ }
+ read_unlock(&pgt->pkvm.mappings_lock);
+
+ return ret;
+}
+
+int pkvm_pgtable_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
+{
+ struct pkvm_mapping *mapping;
+ struct rb_node *tmp;
+
+ read_lock(&pgt->pkvm.mappings_lock);
+ for_each_mapping_in_range(pgt, addr, addr + size, mapping, tmp)
+ __clean_dcache_guest_page(pfn_to_kaddr(mapping->pfn), PAGE_SIZE);
+ read_unlock(&pgt->pkvm.mappings_lock);
+
+ return 0;
+}
+
+bool pkvm_pgtable_test_clear_young(struct kvm_pgtable *pgt, u64 addr, u64 size, bool mkold)
+{
+ pkvm_handle_t handle = pkvm_pgt_to_handle(pgt);
+ struct pkvm_mapping *mapping;
+ struct rb_node *tmp;
+ bool young = false;
+
+ read_lock(&pgt->pkvm.mappings_lock);
+ for_each_mapping_in_range(pgt, addr, addr + size, mapping, tmp)
+ young |= kvm_call_hyp_nvhe(__pkvm_host_wrprotect_guest, handle, mapping->gfn, mkold);
+ read_unlock(&pgt->pkvm.mappings_lock);
+
+ return young;
+}
+
+int pkvm_pgtable_relax_perms(struct kvm_pgtable *pgt, u64 addr, enum kvm_pgtable_prot prot,
+ enum kvm_pgtable_walk_flags flags)
+{
+ return kvm_call_hyp_nvhe(__pkvm_host_relax_guest_perms, addr >> PAGE_SHIFT, prot);
+}
+
+kvm_pte_t pkvm_pgtable_mkyoung(struct kvm_pgtable *pgt, u64 addr, enum kvm_pgtable_walk_flags flags)
+{
+ return kvm_call_hyp_nvhe(__pkvm_host_mkyoung_guest, addr >> PAGE_SHIFT);
+}
+
+void pkvm_pgtable_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, s8 level)
+{
+ WARN_ON(1);
+}
+
+kvm_pte_t *pkvm_pgtable_create_unlinked(struct kvm_pgtable *pgt, u64 phys, s8 level,
+ enum kvm_pgtable_prot prot, void *mc, bool force_pte)
+{
+ WARN_ON(1);
+ return NULL;
+}
+
+int pkvm_pgtable_split(struct kvm_pgtable *pgt, u64 addr, u64 size, struct kvm_mmu_memory_cache *mc)
+{
+ WARN_ON(1);
+ return -EINVAL;
+}
--
2.47.0.163.g1226f6d8fa-goog
^ permalink raw reply related [flat|nested] 24+ messages in thread* Re: [PATCH 17/18] KVM: arm64: Introduce the EL1 pKVM MMU
2024-11-04 13:32 ` [PATCH 17/18] KVM: arm64: Introduce the EL1 pKVM MMU Quentin Perret
@ 2024-11-06 16:58 ` Quentin Perret
0 siblings, 0 replies; 24+ messages in thread
From: Quentin Perret @ 2024-11-06 16:58 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Catalin Marinas, Will Deacon
Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
kvmarm, linux-kernel
On Monday 04 Nov 2024 at 13:32:03 (+0000), Quentin Perret wrote:
> +bool pkvm_pgtable_test_clear_young(struct kvm_pgtable *pgt, u64 addr, u64 size, bool mkold)
> +{
> + pkvm_handle_t handle = pkvm_pgt_to_handle(pgt);
> + struct pkvm_mapping *mapping;
> + struct rb_node *tmp;
> + bool young = false;
> +
> + read_lock(&pgt->pkvm.mappings_lock);
> + for_each_mapping_in_range(pgt, addr, addr + size, mapping, tmp)
> + young |= kvm_call_hyp_nvhe(__pkvm_host_wrprotect_guest, handle, mapping->gfn, mkold);
> + read_unlock(&pgt->pkvm.mappings_lock);
> +
> + return young;
> +}
I just observed a funny behaviour in one of my tests, the above explains
it ... Can you find the bug? Ahem. I'll fix in v2 obviously.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 18/18] KVM: arm64: Plumb the pKVM MMU in KVM
2024-11-04 13:31 [PATCH 00/18] KVM: arm64: Non-protected guest stage-2 support for pKVM Quentin Perret
` (16 preceding siblings ...)
2024-11-04 13:32 ` [PATCH 17/18] KVM: arm64: Introduce the EL1 pKVM MMU Quentin Perret
@ 2024-11-04 13:32 ` Quentin Perret
2024-11-05 5:53 ` kernel test robot
17 siblings, 1 reply; 24+ messages in thread
From: Quentin Perret @ 2024-11-04 13:32 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Catalin Marinas, Will Deacon
Cc: Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
kvmarm, linux-kernel
Introduce the KVM_PGT_S2() helper macro to allow switching from the
traditional pgtable code to the pKVM version easily in mmu.c. The cost
of this 'indirection' is expected to be very minimal due to
is_protected_kvm_enabled() being backed by a static key.
With this, everything is in place to allow the delegation of
non-protected guest stage-2 page-tables to pKVM, so let's stop using the
host's kvm_s2_mmu from EL2 and enjoy the ride.
Signed-off-by: Quentin Perret <qperret@google.com>
---
arch/arm64/kvm/arm.c | 9 ++-
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 2 -
arch/arm64/kvm/mmu.c | 104 +++++++++++++++++++++--------
3 files changed, 84 insertions(+), 31 deletions(-)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 2bf168b17a77..890c89874c6b 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -506,7 +506,10 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
if (vcpu_has_run_once(vcpu) && unlikely(!irqchip_in_kernel(vcpu->kvm)))
static_branch_dec(&userspace_irqchip_in_use);
- kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
+ if (!is_protected_kvm_enabled())
+ kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
+ else
+ free_hyp_memcache(&vcpu->arch.pkvm_memcache);
kvm_timer_vcpu_terminate(vcpu);
kvm_pmu_vcpu_destroy(vcpu);
kvm_vgic_vcpu_destroy(vcpu);
@@ -578,6 +581,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
struct kvm_s2_mmu *mmu;
int *last_ran;
+ if (is_protected_kvm_enabled())
+ goto nommu;
+
if (vcpu_has_nv(vcpu))
kvm_vcpu_load_hw_mmu(vcpu);
@@ -598,6 +604,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
*last_ran = vcpu->vcpu_idx;
}
+nommu:
vcpu->cpu = cpu;
kvm_vgic_load(vcpu);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 1d8baa14ff1c..cf0fd83552c9 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -103,8 +103,6 @@ static void flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
/* Limit guest vector length to the maximum supported by the host. */
hyp_vcpu->vcpu.arch.sve_max_vl = min(host_vcpu->arch.sve_max_vl, kvm_host_sve_max_vl);
- hyp_vcpu->vcpu.arch.hw_mmu = host_vcpu->arch.hw_mmu;
-
hyp_vcpu->vcpu.arch.hcr_el2 = host_vcpu->arch.hcr_el2;
hyp_vcpu->vcpu.arch.mdcr_el2 = host_vcpu->arch.mdcr_el2;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 80dd61038cc7..fcf8fdcccd22 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -15,6 +15,7 @@
#include <asm/kvm_arm.h>
#include <asm/kvm_mmu.h>
#include <asm/kvm_pgtable.h>
+#include <asm/kvm_pkvm.h>
#include <asm/kvm_ras.h>
#include <asm/kvm_asm.h>
#include <asm/kvm_emulate.h>
@@ -31,6 +32,14 @@ static phys_addr_t __ro_after_init hyp_idmap_vector;
static unsigned long __ro_after_init io_map_base;
+#define KVM_PGT_S2(fn, ...) \
+ ({ \
+ typeof(kvm_pgtable_stage2_ ## fn) *__fn = kvm_pgtable_stage2_ ## fn; \
+ if (is_protected_kvm_enabled()) \
+ __fn = pkvm_pgtable_ ## fn; \
+ __fn(__VA_ARGS__); \
+ })
+
static phys_addr_t __stage2_range_addr_end(phys_addr_t addr, phys_addr_t end,
phys_addr_t size)
{
@@ -147,7 +156,7 @@ static int kvm_mmu_split_huge_pages(struct kvm *kvm, phys_addr_t addr,
return -EINVAL;
next = __stage2_range_addr_end(addr, end, chunk_size);
- ret = kvm_pgtable_stage2_split(pgt, addr, next - addr, cache);
+ ret = KVM_PGT_S2(split, pgt, addr, next - addr, cache);
if (ret)
break;
} while (addr = next, addr != end);
@@ -168,15 +177,23 @@ static bool memslot_is_logging(struct kvm_memory_slot *memslot)
*/
int kvm_arch_flush_remote_tlbs(struct kvm *kvm)
{
- kvm_call_hyp(__kvm_tlb_flush_vmid, &kvm->arch.mmu);
+ if (is_protected_kvm_enabled())
+ kvm_call_hyp_nvhe(__pkvm_tlb_flush_vmid, kvm->arch.pkvm.handle);
+ else
+ kvm_call_hyp(__kvm_tlb_flush_vmid, &kvm->arch.mmu);
return 0;
}
int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm,
gfn_t gfn, u64 nr_pages)
{
- kvm_tlb_flush_vmid_range(&kvm->arch.mmu,
- gfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT);
+ u64 size = nr_pages << PAGE_SHIFT;
+ u64 addr = gfn << PAGE_SHIFT;
+
+ if (is_protected_kvm_enabled())
+ kvm_call_hyp_nvhe(__pkvm_tlb_flush_vmid, kvm->arch.pkvm.handle);
+ else
+ kvm_tlb_flush_vmid_range(&kvm->arch.mmu, addr, size);
return 0;
}
@@ -225,7 +242,7 @@ static void stage2_free_unlinked_table_rcu_cb(struct rcu_head *head)
void *pgtable = page_to_virt(page);
s8 level = page_private(page);
- kvm_pgtable_stage2_free_unlinked(&kvm_s2_mm_ops, pgtable, level);
+ KVM_PGT_S2(free_unlinked, &kvm_s2_mm_ops, pgtable, level);
}
static void stage2_free_unlinked_table(void *addr, s8 level)
@@ -316,6 +333,12 @@ static void invalidate_icache_guest_page(void *va, size_t size)
* destroying the VM), otherwise another faulting VCPU may come in and mess
* with things behind our backs.
*/
+
+static int kvm_s2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
+{
+ return KVM_PGT_S2(unmap, pgt, addr, size);
+}
+
static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size,
bool may_block)
{
@@ -324,8 +347,7 @@ static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64
lockdep_assert_held_write(&kvm->mmu_lock);
WARN_ON(size & ~PAGE_MASK);
- WARN_ON(stage2_apply_range(mmu, start, end, kvm_pgtable_stage2_unmap,
- may_block));
+ WARN_ON(stage2_apply_range(mmu, start, end, kvm_s2_unmap, may_block));
}
void kvm_stage2_unmap_range(struct kvm_s2_mmu *mmu, phys_addr_t start,
@@ -334,9 +356,14 @@ void kvm_stage2_unmap_range(struct kvm_s2_mmu *mmu, phys_addr_t start,
__unmap_stage2_range(mmu, start, size, may_block);
}
+static int kvm_s2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
+{
+ return KVM_PGT_S2(flush, pgt, addr, size);
+}
+
void kvm_stage2_flush_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
{
- stage2_apply_range_resched(mmu, addr, end, kvm_pgtable_stage2_flush);
+ stage2_apply_range_resched(mmu, addr, end, kvm_s2_flush);
}
static void stage2_flush_memslot(struct kvm *kvm,
@@ -942,10 +969,14 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
return -ENOMEM;
mmu->arch = &kvm->arch;
- err = kvm_pgtable_stage2_init(pgt, mmu, &kvm_s2_mm_ops);
+ err = KVM_PGT_S2(init, pgt, mmu, &kvm_s2_mm_ops);
if (err)
goto out_free_pgtable;
+ mmu->pgt = pgt;
+ if (is_protected_kvm_enabled())
+ return 0;
+
mmu->last_vcpu_ran = alloc_percpu(typeof(*mmu->last_vcpu_ran));
if (!mmu->last_vcpu_ran) {
err = -ENOMEM;
@@ -959,7 +990,6 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
mmu->split_page_chunk_size = KVM_ARM_EAGER_SPLIT_CHUNK_SIZE_DEFAULT;
mmu->split_page_cache.gfp_zero = __GFP_ZERO;
- mmu->pgt = pgt;
mmu->pgd_phys = __pa(pgt->pgd);
if (kvm_is_nested_s2_mmu(kvm, mmu))
@@ -968,7 +998,7 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
return 0;
out_destroy_pgtable:
- kvm_pgtable_stage2_destroy(pgt);
+ KVM_PGT_S2(destroy, pgt);
out_free_pgtable:
kfree(pgt);
return err;
@@ -1065,7 +1095,7 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
write_unlock(&kvm->mmu_lock);
if (pgt) {
- kvm_pgtable_stage2_destroy(pgt);
+ KVM_PGT_S2(destroy, pgt);
kfree(pgt);
}
}
@@ -1082,9 +1112,11 @@ static void *hyp_mc_alloc_fn(void *unused)
void free_hyp_memcache(struct kvm_hyp_memcache *mc)
{
- if (is_protected_kvm_enabled())
- __free_hyp_memcache(mc, hyp_mc_free_fn,
- kvm_host_va, NULL);
+ if (!is_protected_kvm_enabled())
+ return;
+
+ kfree(mc->mapping);
+ __free_hyp_memcache(mc, hyp_mc_free_fn, kvm_host_va, NULL);
}
int topup_hyp_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages)
@@ -1092,6 +1124,12 @@ int topup_hyp_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages)
if (!is_protected_kvm_enabled())
return 0;
+ if (!mc->mapping) {
+ mc->mapping = kzalloc(sizeof(struct pkvm_mapping), GFP_KERNEL_ACCOUNT);
+ if (!mc->mapping)
+ return -ENOMEM;
+ }
+
return __topup_hyp_memcache(mc, min_pages, hyp_mc_alloc_fn,
kvm_host_pa, NULL);
}
@@ -1130,8 +1168,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
break;
write_lock(&kvm->mmu_lock);
- ret = kvm_pgtable_stage2_map(pgt, addr, PAGE_SIZE, pa, prot,
- &cache, 0);
+ ret = KVM_PGT_S2(map, pgt, addr, PAGE_SIZE, pa, prot, &cache, 0);
write_unlock(&kvm->mmu_lock);
if (ret)
break;
@@ -1143,6 +1180,10 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
return ret;
}
+static int kvm_s2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
+{
+ return KVM_PGT_S2(wrprotect, pgt, addr, size);
+}
/**
* kvm_stage2_wp_range() - write protect stage2 memory region range
* @mmu: The KVM stage-2 MMU pointer
@@ -1151,7 +1192,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
*/
void kvm_stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
{
- stage2_apply_range_resched(mmu, addr, end, kvm_pgtable_stage2_wrprotect);
+ stage2_apply_range_resched(mmu, addr, end, kvm_s2_wrprotect);
}
/**
@@ -1431,9 +1472,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
unsigned long mmu_seq;
phys_addr_t ipa = fault_ipa;
struct kvm *kvm = vcpu->kvm;
- struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
struct vm_area_struct *vma;
short vma_shift;
+ void *memcache;
gfn_t gfn;
kvm_pfn_t pfn;
bool logging_active = memslot_is_logging(memslot);
@@ -1460,8 +1501,15 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
* and a write fault needs to collapse a block entry into a table.
*/
if (!fault_is_perm || (logging_active && write_fault)) {
- ret = kvm_mmu_topup_memory_cache(memcache,
- kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu));
+ int min_pages = kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu);
+
+ if (!is_protected_kvm_enabled()) {
+ memcache = &vcpu->arch.mmu_page_cache;
+ ret = kvm_mmu_topup_memory_cache(memcache, min_pages);
+ } else {
+ memcache = &vcpu->arch.pkvm_memcache;
+ ret = topup_hyp_memcache(memcache, min_pages);
+ }
if (ret)
return ret;
}
@@ -1482,7 +1530,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
* logging_active is guaranteed to never be true for VM_PFNMAP
* memslots.
*/
- if (logging_active) {
+ if (logging_active || is_protected_kvm_enabled()) {
force_pte = true;
vma_shift = PAGE_SHIFT;
} else {
@@ -1684,9 +1732,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
* PTE, which will be preserved.
*/
prot &= ~KVM_NV_GUEST_MAP_SZ;
- ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot, flags);
+ ret = KVM_PGT_S2(relax_perms, pgt, fault_ipa, prot, flags);
} else {
- ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
+ ret = KVM_PGT_S2(map, pgt, fault_ipa, vma_pagesize,
__pfn_to_phys(pfn), prot,
memcache, flags);
}
@@ -1715,7 +1763,7 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
read_lock(&vcpu->kvm->mmu_lock);
mmu = vcpu->arch.hw_mmu;
- pte = kvm_pgtable_stage2_mkyoung(mmu->pgt, fault_ipa, flags);
+ pte = KVM_PGT_S2(mkyoung, mmu->pgt, fault_ipa, flags);
read_unlock(&vcpu->kvm->mmu_lock);
if (kvm_pte_valid(pte))
@@ -1758,7 +1806,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
}
/* Falls between the IPA range and the PARange? */
- if (fault_ipa >= BIT_ULL(vcpu->arch.hw_mmu->pgt->ia_bits)) {
+ if (fault_ipa >= BIT_ULL(VTCR_EL2_IPA(vcpu->arch.hw_mmu->vtcr))) {
fault_ipa |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0);
if (is_iabt)
@@ -1924,7 +1972,7 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
if (!kvm->arch.mmu.pgt)
return false;
- return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
+ return KVM_PGT_S2(test_clear_young, kvm->arch.mmu.pgt,
range->start << PAGE_SHIFT,
size, true);
/*
@@ -1940,7 +1988,7 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
if (!kvm->arch.mmu.pgt)
return false;
- return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
+ return KVM_PGT_S2(test_clear_young, kvm->arch.mmu.pgt,
range->start << PAGE_SHIFT,
size, false);
}
--
2.47.0.163.g1226f6d8fa-goog
^ permalink raw reply related [flat|nested] 24+ messages in thread* Re: [PATCH 18/18] KVM: arm64: Plumb the pKVM MMU in KVM
2024-11-04 13:32 ` [PATCH 18/18] KVM: arm64: Plumb the pKVM MMU in KVM Quentin Perret
@ 2024-11-05 5:53 ` kernel test robot
2024-11-05 16:07 ` Quentin Perret
0 siblings, 1 reply; 24+ messages in thread
From: kernel test robot @ 2024-11-05 5:53 UTC (permalink / raw)
To: Quentin Perret, Marc Zyngier, Oliver Upton, Joey Gouly,
Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon
Cc: oe-kbuild-all, Fuad Tabba, Vincent Donnefort, Sebastian Ene,
linux-arm-kernel, kvmarm, linux-kernel
Hi Quentin,
kernel test robot noticed the following build warnings:
[auto build test WARNING on v6.12-rc6]
[also build test WARNING on linus/master]
[cannot apply to kvmarm/next next-20241104]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Quentin-Perret/KVM-arm64-Change-the-layout-of-enum-pkvm_page_state/20241104-213817
base: v6.12-rc6
patch link: https://lore.kernel.org/r/20241104133204.85208-19-qperret%40google.com
patch subject: [PATCH 18/18] KVM: arm64: Plumb the pKVM MMU in KVM
config: arm64-randconfig-002-20241105 (https://download.01.org/0day-ci/archive/20241105/202411051325.EBkzE0th-lkp@intel.com/config)
compiler: aarch64-linux-gcc (GCC) 14.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241105/202411051325.EBkzE0th-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202411051325.EBkzE0th-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> arch/arm64/kvm/mmu.c:338: warning: Function parameter or struct member 'pgt' not described in 'kvm_s2_unmap'
>> arch/arm64/kvm/mmu.c:338: warning: Function parameter or struct member 'addr' not described in 'kvm_s2_unmap'
>> arch/arm64/kvm/mmu.c:338: warning: expecting prototype for __unmap_stage2_range(). Prototype was for kvm_s2_unmap() instead
vim +338 arch/arm64/kvm/mmu.c
299
300 /*
301 * Unmapping vs dcache management:
302 *
303 * If a guest maps certain memory pages as uncached, all writes will
304 * bypass the data cache and go directly to RAM. However, the CPUs
305 * can still speculate reads (not writes) and fill cache lines with
306 * data.
307 *
308 * Those cache lines will be *clean* cache lines though, so a
309 * clean+invalidate operation is equivalent to an invalidate
310 * operation, because no cache lines are marked dirty.
311 *
312 * Those clean cache lines could be filled prior to an uncached write
313 * by the guest, and the cache coherent IO subsystem would therefore
314 * end up writing old data to disk.
315 *
316 * This is why right after unmapping a page/section and invalidating
317 * the corresponding TLBs, we flush to make sure the IO subsystem will
318 * never hit in the cache.
319 *
320 * This is all avoided on systems that have ARM64_HAS_STAGE2_FWB, as
321 * we then fully enforce cacheability of RAM, no matter what the guest
322 * does.
323 */
324 /**
325 * __unmap_stage2_range -- Clear stage2 page table entries to unmap a range
326 * @mmu: The KVM stage-2 MMU pointer
327 * @start: The intermediate physical base address of the range to unmap
328 * @size: The size of the area to unmap
329 * @may_block: Whether or not we are permitted to block
330 *
331 * Clear a range of stage-2 mappings, lowering the various ref-counts. Must
332 * be called while holding mmu_lock (unless for freeing the stage2 pgd before
333 * destroying the VM), otherwise another faulting VCPU may come in and mess
334 * with things behind our backs.
335 */
336
337 static int kvm_s2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
> 338 {
339 return KVM_PGT_S2(unmap, pgt, addr, size);
340 }
341
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH 18/18] KVM: arm64: Plumb the pKVM MMU in KVM
2024-11-05 5:53 ` kernel test robot
@ 2024-11-05 16:07 ` Quentin Perret
0 siblings, 0 replies; 24+ messages in thread
From: Quentin Perret @ 2024-11-05 16:07 UTC (permalink / raw)
To: kernel test robot
Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Catalin Marinas, Will Deacon, oe-kbuild-all,
Fuad Tabba, Vincent Donnefort, Sebastian Ene, linux-arm-kernel,
kvmarm, linux-kernel
On Tuesday 05 Nov 2024 at 13:53:22 (+0800), kernel test robot wrote:
> Hi Quentin,
>
> kernel test robot noticed the following build warnings:
>
> [auto build test WARNING on v6.12-rc6]
> [also build test WARNING on linus/master]
> [cannot apply to kvmarm/next next-20241104]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Quentin-Perret/KVM-arm64-Change-the-layout-of-enum-pkvm_page_state/20241104-213817
> base: v6.12-rc6
> patch link: https://lore.kernel.org/r/20241104133204.85208-19-qperret%40google.com
> patch subject: [PATCH 18/18] KVM: arm64: Plumb the pKVM MMU in KVM
> config: arm64-randconfig-002-20241105 (https://download.01.org/0day-ci/archive/20241105/202411051325.EBkzE0th-lkp@intel.com/config)
> compiler: aarch64-linux-gcc (GCC) 14.1.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241105/202411051325.EBkzE0th-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202411051325.EBkzE0th-lkp@intel.com/
>
> All warnings (new ones prefixed by >>):
>
> >> arch/arm64/kvm/mmu.c:338: warning: Function parameter or struct member 'pgt' not described in 'kvm_s2_unmap'
> >> arch/arm64/kvm/mmu.c:338: warning: Function parameter or struct member 'addr' not described in 'kvm_s2_unmap'
> >> arch/arm64/kvm/mmu.c:338: warning: expecting prototype for __unmap_stage2_range(). Prototype was for kvm_s2_unmap() instead
>
>
> vim +338 arch/arm64/kvm/mmu.c
>
> 299
> 300 /*
> 301 * Unmapping vs dcache management:
> 302 *
> 303 * If a guest maps certain memory pages as uncached, all writes will
> 304 * bypass the data cache and go directly to RAM. However, the CPUs
> 305 * can still speculate reads (not writes) and fill cache lines with
> 306 * data.
> 307 *
> 308 * Those cache lines will be *clean* cache lines though, so a
> 309 * clean+invalidate operation is equivalent to an invalidate
> 310 * operation, because no cache lines are marked dirty.
> 311 *
> 312 * Those clean cache lines could be filled prior to an uncached write
> 313 * by the guest, and the cache coherent IO subsystem would therefore
> 314 * end up writing old data to disk.
> 315 *
> 316 * This is why right after unmapping a page/section and invalidating
> 317 * the corresponding TLBs, we flush to make sure the IO subsystem will
> 318 * never hit in the cache.
> 319 *
> 320 * This is all avoided on systems that have ARM64_HAS_STAGE2_FWB, as
> 321 * we then fully enforce cacheability of RAM, no matter what the guest
> 322 * does.
> 323 */
> 324 /**
> 325 * __unmap_stage2_range -- Clear stage2 page table entries to unmap a range
> 326 * @mmu: The KVM stage-2 MMU pointer
> 327 * @start: The intermediate physical base address of the range to unmap
> 328 * @size: The size of the area to unmap
> 329 * @may_block: Whether or not we are permitted to block
> 330 *
> 331 * Clear a range of stage-2 mappings, lowering the various ref-counts. Must
> 332 * be called while holding mmu_lock (unless for freeing the stage2 pgd before
> 333 * destroying the VM), otherwise another faulting VCPU may come in and mess
> 334 * with things behind our backs.
> 335 */
> 336
> 337 static int kvm_s2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
> > 338 {
> 339 return KVM_PGT_S2(unmap, pgt, addr, size);
> 340 }
> 341
Oops, yes, that broke the kerneldoc comment, I'll fix in v2.
^ permalink raw reply [flat|nested] 24+ messages in thread