linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Quentin Perret <qperret@google.com>
To: catalin.marinas@arm.com, will@kernel.org, maz@kernel.org,
	 james.morse@arm.com, julien.thierry.kdev@gmail.com,
	suzuki.poulose@arm.com
Cc: android-kvm@google.com, linux-kernel@vger.kernel.org,
	 kernel-team@android.com, kvmarm@lists.cs.columbia.edu,
	 linux-arm-kernel@lists.infradead.org, tabba@google.com,
	mark.rutland@arm.com,  dbrazdil@google.com,
	mate.toth-pal@arm.com, seanjc@google.com,  qperret@google.com,
	robh+dt@kernel.org, ardb@kernel.org
Subject: [PATCH v4 28/34] KVM: arm64: Use page-table to track page ownership
Date: Wed, 10 Mar 2021 17:57:45 +0000	[thread overview]
Message-ID: <20210310175751.3320106-29-qperret@google.com> (raw)
In-Reply-To: <20210310175751.3320106-1-qperret@google.com>

As the host stage 2 will be identity mapped, all the .hyp memory regions
and/or memory pages donated to protected guestis will have to marked
invalid in the host stage 2 page-table. At the same time, the hypervisor
will need a way to track the ownership of each physical page to ensure
memory sharing or donation between entities (host, guests, hypervisor) is
legal.

In order to enable this tracking at EL2, let's use the host stage 2
page-table itself. The idea is to use the top bits of invalid mappings
to store the unique identifier of the page owner. The page-table owner
(the host) gets identifier 0 such that, at boot time, it owns the entire
IPA space as the pgd starts zeroed.

Provide kvm_pgtable_stage2_set_owner() which allows to modify the
ownership of pages in the host stage 2. It re-uses most of the map()
logic, but ends up creating invalid mappings instead. This impacts
how we do refcount as we now need to count invalid mappings when they
are used for ownership tracking.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_pgtable.h | 21 +++++++
 arch/arm64/kvm/hyp/pgtable.c         | 92 ++++++++++++++++++++++++----
 2 files changed, 101 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 4ae19247837b..b09af4612656 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -238,6 +238,27 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 			   u64 phys, enum kvm_pgtable_prot prot,
 			   void *mc);
 
+/**
+ * kvm_pgtable_stage2_set_owner() - Annotate invalid mappings with metadata
+ *				    encoding the ownership of a page in the
+ *				    IPA space.
+ * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init().
+ * @addr:	Intermediate physical address at which to place the annotation.
+ * @size:	Size of the IPA range to annotate.
+ * @mc:		Cache of pre-allocated and zeroed memory from which to allocate
+ *		page-table pages.
+ * @owner_id:	Unique identifier for the owner of the page.
+ *
+ * The page-table owner has identifier 0. This function can be used to mark
+ * portions of the IPA space as owned by other entities. When a stage 2 is used
+ * with identity-mappings, these annotations allow to use the page-table data
+ * structure as a simple rmap.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
+				 void *mc, u32 owner_id);
+
 /**
  * kvm_pgtable_stage2_unmap() - Remove a mapping from a guest stage-2 page-table.
  * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init().
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index f37b4179b880..e4670b639726 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -48,6 +48,8 @@
 					 KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \
 					 KVM_PTE_LEAF_ATTR_HI_S2_XN)
 
+#define KVM_INVALID_PTE_OWNER_MASK	GENMASK(63, 32)
+
 struct kvm_pgtable_walk_data {
 	struct kvm_pgtable		*pgt;
 	struct kvm_pgtable_walker	*walker;
@@ -186,6 +188,11 @@ static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
 	return pte;
 }
 
+static kvm_pte_t kvm_init_invalid_leaf_owner(u32 owner_id)
+{
+	return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
+}
+
 static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
 				  u32 level, kvm_pte_t *ptep,
 				  enum kvm_pgtable_walk_flags flag)
@@ -440,6 +447,7 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
 struct stage2_map_data {
 	u64				phys;
 	kvm_pte_t			attr;
+	u32				owner_id;
 
 	kvm_pte_t			*anchor;
 	kvm_pte_t			*childp;
@@ -506,6 +514,24 @@ static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
 	return 0;
 }
 
+static bool stage2_is_permission_change(kvm_pte_t old, kvm_pte_t new)
+{
+	if (!kvm_pte_valid(old) || !kvm_pte_valid(new))
+		return false;
+
+	return !((old ^ new) & (~KVM_PTE_LEAF_ATTR_S2_PERMS));
+}
+
+static bool stage2_pte_is_counted(kvm_pte_t pte)
+{
+	/*
+	 * The refcount tracks valid entries as well as invalid entries if they
+	 * encode ownership of a page to another entity than the page-table
+	 * owner, whose id is 0.
+	 */
+	return !!pte;
+}
+
 static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 				      kvm_pte_t *ptep,
 				      struct stage2_map_data *data)
@@ -517,28 +543,36 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 	if (!kvm_block_mapping_supported(addr, end, phys, level))
 		return -E2BIG;
 
-	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
-	if (kvm_pte_valid(old)) {
+	if (kvm_pte_valid(data->attr))
+		new = kvm_init_valid_leaf_pte(phys, data->attr, level);
+	else
+		new = kvm_init_invalid_leaf_owner(data->owner_id);
+
+	if (stage2_pte_is_counted(old)) {
 		/*
 		 * Skip updating the PTE if we are trying to recreate the exact
 		 * same mapping or only change the access permissions. Instead,
 		 * the vCPU will exit one more time from guest if still needed
 		 * and then go through the path of relaxing permissions.
 		 */
-		if (!((old ^ new) & (~KVM_PTE_LEAF_ATTR_S2_PERMS)))
+		if (stage2_is_permission_change(old, new))
 			return -EAGAIN;
 
 		/*
-		 * There's an existing different valid leaf entry, so perform
-		 * break-before-make.
+		 * Clear the existing PTE, and perform break-before-make with
+		 * TLB maintenance if it was valid.
 		 */
 		kvm_clear_pte(ptep);
-		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
+		if (kvm_pte_valid(old)) {
+			kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr,
+				     level);
+		}
 		mm_ops->put_page(ptep);
 	}
 
 	smp_store_release(ptep, new);
-	mm_ops->get_page(ptep);
+	if (stage2_pte_is_counted(new))
+		mm_ops->get_page(ptep);
 	data->phys += granule;
 	return 0;
 }
@@ -574,7 +608,7 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	int ret;
 
 	if (data->anchor) {
-		if (kvm_pte_valid(pte))
+		if (stage2_pte_is_counted(pte))
 			mm_ops->put_page(ptep);
 
 		return 0;
@@ -599,9 +633,10 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	 * a table. Accesses beyond 'end' that fall within the new table
 	 * will be mapped lazily.
 	 */
-	if (kvm_pte_valid(pte)) {
+	if (stage2_pte_is_counted(pte)) {
 		kvm_clear_pte(ptep);
-		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
+		if (kvm_pte_valid(pte))
+			kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
 		mm_ops->put_page(ptep);
 	}
 
@@ -683,6 +718,7 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 		.mmu		= pgt->mmu,
 		.memcache	= mc,
 		.mm_ops		= pgt->mm_ops,
+		.owner_id	= 0,
 	};
 	struct kvm_pgtable_walker walker = {
 		.cb		= stage2_map_walker,
@@ -696,6 +732,33 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	if (ret)
 		return ret;
 
+	/* Set the valid flag to distinguish with the set_owner() path. */
+	map_data.attr |= KVM_PTE_VALID;
+
+	ret = kvm_pgtable_walk(pgt, addr, size, &walker);
+	dsb(ishst);
+	return ret;
+}
+
+int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
+				 void *mc, u32 owner_id)
+{
+	int ret;
+	struct stage2_map_data map_data = {
+		.mmu		= pgt->mmu,
+		.memcache	= mc,
+		.mm_ops		= pgt->mm_ops,
+		.owner_id	= owner_id,
+		.attr		= 0,
+	};
+	struct kvm_pgtable_walker walker = {
+		.cb		= stage2_map_walker,
+		.flags		= KVM_PGTABLE_WALK_TABLE_PRE |
+				  KVM_PGTABLE_WALK_LEAF |
+				  KVM_PGTABLE_WALK_TABLE_POST,
+		.arg		= &map_data,
+	};
+
 	ret = kvm_pgtable_walk(pgt, addr, size, &walker);
 	dsb(ishst);
 	return ret;
@@ -725,8 +788,13 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	kvm_pte_t pte = *ptep, *childp = NULL;
 	bool need_flush = false;
 
-	if (!kvm_pte_valid(pte))
+	if (!kvm_pte_valid(pte)) {
+		if (stage2_pte_is_counted(pte)) {
+			kvm_clear_pte(ptep);
+			mm_ops->put_page(ptep);
+		}
 		return 0;
+	}
 
 	if (kvm_pte_table(pte, level)) {
 		childp = kvm_pte_follow(pte, mm_ops);
@@ -948,7 +1016,7 @@ static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	struct kvm_pgtable_mm_ops *mm_ops = arg;
 	kvm_pte_t pte = *ptep;
 
-	if (!kvm_pte_valid(pte))
+	if (!stage2_pte_is_counted(pte))
 		return 0;
 
 	mm_ops->put_page(ptep);
-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2021-03-10 18:12 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-10 17:57 [PATCH v4 00/34] KVM: arm64: A stage 2 for the host Quentin Perret
2021-03-10 17:57 ` [PATCH v4 01/34] arm64: lib: Annotate {clear, copy}_page() as position-independent Quentin Perret
2021-03-10 17:57 ` [PATCH v4 02/34] KVM: arm64: Link position-independent string routines into .hyp.text Quentin Perret
2021-03-10 17:57 ` [PATCH v4 03/34] arm64: kvm: Add standalone ticket spinlock implementation for use at hyp Quentin Perret
2021-03-10 17:57 ` [PATCH v4 04/34] KVM: arm64: Initialize kvm_nvhe_init_params early Quentin Perret
2021-03-10 17:57 ` [PATCH v4 05/34] KVM: arm64: Avoid free_page() in page-table allocator Quentin Perret
2021-03-10 17:57 ` [PATCH v4 06/34] KVM: arm64: Factor memory allocation out of pgtable.c Quentin Perret
2021-03-11 16:09   ` Will Deacon
2021-03-10 17:57 ` [PATCH v4 07/34] KVM: arm64: Introduce a BSS section for use at Hyp Quentin Perret
2021-03-10 17:57 ` [PATCH v4 08/34] KVM: arm64: Make kvm_call_hyp() a function call " Quentin Perret
2021-03-10 17:57 ` [PATCH v4 09/34] KVM: arm64: Allow using kvm_nvhe_sym() in hyp code Quentin Perret
2021-03-10 17:57 ` [PATCH v4 10/34] KVM: arm64: Introduce an early Hyp page allocator Quentin Perret
2021-03-10 17:57 ` [PATCH v4 11/34] KVM: arm64: Stub CONFIG_DEBUG_LIST at Hyp Quentin Perret
2021-03-11 16:11   ` Will Deacon
2021-03-10 17:57 ` [PATCH v4 12/34] KVM: arm64: Introduce a Hyp buddy page allocator Quentin Perret
2021-03-11 16:14   ` Will Deacon
2021-03-10 17:57 ` [PATCH v4 13/34] KVM: arm64: Enable access to sanitized CPU features at EL2 Quentin Perret
2021-03-11 19:36   ` Will Deacon
2021-03-12  6:34     ` Quentin Perret
2021-03-12  9:25       ` Will Deacon
2021-03-10 17:57 ` [PATCH v4 14/34] KVM: arm64: Factor out vector address calculation Quentin Perret
2021-03-10 17:57 ` [PATCH v4 15/34] arm64: asm: Provide set_sctlr_el2 macro Quentin Perret
2021-03-11 16:22   ` Will Deacon
2021-03-10 17:57 ` [PATCH v4 16/34] KVM: arm64: Prepare the creation of s1 mappings at EL2 Quentin Perret
2021-03-11 16:21   ` Will Deacon
2021-03-10 17:57 ` [PATCH v4 17/34] KVM: arm64: Elevate hypervisor mappings creation " Quentin Perret
2021-03-11 17:28   ` Will Deacon
2021-03-10 17:57 ` [PATCH v4 18/34] KVM: arm64: Use kvm_arch for stage 2 pgtable Quentin Perret
2021-03-10 17:57 ` [PATCH v4 19/34] KVM: arm64: Use kvm_arch in kvm_s2_mmu Quentin Perret
2021-03-10 17:57 ` [PATCH v4 20/34] KVM: arm64: Set host stage 2 using kvm_nvhe_init_params Quentin Perret
2021-03-10 17:57 ` [PATCH v4 21/34] KVM: arm64: Refactor kvm_arm_setup_stage2() Quentin Perret
2021-03-10 17:57 ` [PATCH v4 22/34] KVM: arm64: Refactor __load_guest_stage2() Quentin Perret
2021-03-10 17:57 ` [PATCH v4 23/34] KVM: arm64: Refactor __populate_fault_info() Quentin Perret
2021-03-10 17:57 ` [PATCH v4 24/34] KVM: arm64: Make memcache anonymous in pgtable allocator Quentin Perret
2021-03-10 17:57 ` [PATCH v4 25/34] KVM: arm64: Reserve memory for host stage 2 Quentin Perret
2021-03-10 17:57 ` [PATCH v4 26/34] KVM: arm64: Sort the hypervisor memblocks Quentin Perret
2021-03-10 17:57 ` [PATCH v4 27/34] KVM: arm64: Always zero invalid PTEs Quentin Perret
2021-03-11 17:33   ` Will Deacon
2021-03-12  9:15     ` Quentin Perret
2021-03-10 17:57 ` Quentin Perret [this message]
2021-03-11 18:38   ` [PATCH v4 28/34] KVM: arm64: Use page-table to track page ownership Will Deacon
2021-03-12  6:23     ` Quentin Perret
2021-03-12  9:32       ` Will Deacon
2021-03-12 10:13         ` Quentin Perret
2021-03-12 11:18           ` Will Deacon
2021-03-12 11:45             ` Quentin Perret
2021-03-10 17:57 ` [PATCH v4 29/34] KVM: arm64: Refactor stage2_map_set_prot_attr() Quentin Perret
2021-03-11 18:48   ` Will Deacon
2021-03-12  5:10     ` Quentin Perret
2021-03-10 17:57 ` [PATCH v4 30/34] KVM: arm64: Add kvm_pgtable_stage2_find_range() Quentin Perret
2021-03-11 19:04   ` Will Deacon
2021-03-12  5:32     ` Quentin Perret
2021-03-12  9:40       ` Will Deacon
2021-03-10 17:57 ` [PATCH v4 31/34] KVM: arm64: Wrap the host with a stage 2 Quentin Perret
2021-03-11 19:09   ` Will Deacon
2021-03-10 17:57 ` [PATCH v4 32/34] KVM: arm64: Page-align the .hyp sections Quentin Perret
2021-03-10 17:57 ` [PATCH v4 33/34] KVM: arm64: Disable PMU support in protected mode Quentin Perret
2021-03-10 17:57 ` [PATCH v4 34/34] KVM: arm64: Protect the .hyp sections from the host Quentin Perret
2021-03-11 19:17   ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210310175751.3320106-29-qperret@google.com \
    --to=qperret@google.com \
    --cc=android-kvm@google.com \
    --cc=ardb@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=dbrazdil@google.com \
    --cc=james.morse@arm.com \
    --cc=julien.thierry.kdev@gmail.com \
    --cc=kernel-team@android.com \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mate.toth-pal@arm.com \
    --cc=maz@kernel.org \
    --cc=robh+dt@kernel.org \
    --cc=seanjc@google.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tabba@google.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).