* [PATCH 0/7] KVM: x86/MMU: Factor rmap operations out of mmu.c
@ 2022-12-06 17:35 Ben Gardon
2022-12-06 17:35 ` [PATCH 1/7] KVM: x86/MMU: Move pte_list operations to rmap.c Ben Gardon
` (7 more replies)
0 siblings, 8 replies; 21+ messages in thread
From: Ben Gardon @ 2022-12-06 17:35 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, David Matlack,
Vipin Sharma, Ben Gardon
Move the basic operations for manipulating an rmap out of mmu.c, into
their own files.
This is the first step in a series of series to split the Shadow MMU out of
mmu.c. Splitting the shadow MMU out results in about a 50% reduction in line
count of mmu.c, from ~7K to ~3.5K. The rmap operations only represent about
10% of the Shadow MMU but are split out first becase the are relatively
self-contained.
This split may also help facilitate the addition of kUnit tests for rmap
manipulation, especially the fiddly bit flag which diferentiates a direct
pointer from a pte_list_desc.
This effort is complimentary to David Matlack's proposal to refactor
the TDP MMU into an arch-neutral implementation because it further
clarifies the distinction between the Shadow MMU and TDP MMU within x86.
This series contains no functional changes. It's just a direct movement
of code from one file to another.
Whether this rmap code should stay in its own file or get merged with
the rest of the shadow MMU code as it is factored out is open for
debate.
Ben Gardon (7):
KVM: x86/MMU: Move pte_list operations to rmap.c
KVM: x86/MMU: Move rmap_iterator to rmap.h
KVM: x86/MMU: Move gfn_to_rmap() to rmap.c
KVM: x86/MMU: Move rmap_can_add() and rmap_remove() to rmap.c
KVM: x86/MMU: Move the rmap walk iterator out of mmu.c
KVM: x86/MMU: Move rmap zap operations to rmap.c
KVM: x86/MMU: Move rmap_add() to rmap.c
arch/x86/kvm/Makefile | 2 +-
arch/x86/kvm/debugfs.c | 1 +
arch/x86/kvm/mmu/mmu.c | 438 +-------------------------------
arch/x86/kvm/mmu/mmu_internal.h | 9 +-
arch/x86/kvm/mmu/rmap.c | 368 +++++++++++++++++++++++++++
arch/x86/kvm/mmu/rmap.h | 106 ++++++++
6 files changed, 492 insertions(+), 432 deletions(-)
create mode 100644 arch/x86/kvm/mmu/rmap.c
create mode 100644 arch/x86/kvm/mmu/rmap.h
--
2.39.0.rc0.267.gcb52ba06e7-goog
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 1/7] KVM: x86/MMU: Move pte_list operations to rmap.c
2022-12-06 17:35 [PATCH 0/7] KVM: x86/MMU: Factor rmap operations out of mmu.c Ben Gardon
@ 2022-12-06 17:35 ` Ben Gardon
2022-12-07 22:58 ` Vipin Sharma
2022-12-09 22:22 ` David Matlack
2022-12-06 17:35 ` [PATCH 2/7] KVM: x86/MMU: Move rmap_iterator to rmap.h Ben Gardon
` (6 subsequent siblings)
7 siblings, 2 replies; 21+ messages in thread
From: Ben Gardon @ 2022-12-06 17:35 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, David Matlack,
Vipin Sharma, Ben Gardon
In the interest of eventually splitting the Shadow MMU out of mmu.c,
start by moving some of the operations for manipulating pte_lists out of
mmu.c and into a new pair of files: rmap.c and rmap.h.
No functional change intended.
Signed-off-by: Ben Gardon <bgardon@google.com>
---
arch/x86/kvm/Makefile | 2 +-
arch/x86/kvm/debugfs.c | 1 +
arch/x86/kvm/mmu/mmu.c | 152 +-------------------------------
arch/x86/kvm/mmu/mmu_internal.h | 1 -
arch/x86/kvm/mmu/rmap.c | 141 +++++++++++++++++++++++++++++
arch/x86/kvm/mmu/rmap.h | 34 +++++++
6 files changed, 179 insertions(+), 152 deletions(-)
create mode 100644 arch/x86/kvm/mmu/rmap.c
create mode 100644 arch/x86/kvm/mmu/rmap.h
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 80e3fe184d17..9f766eebeddf 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -12,7 +12,7 @@ include $(srctree)/virt/kvm/Makefile.kvm
kvm-y += x86.o emulate.o i8259.o irq.o lapic.o \
i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \
hyperv.o debugfs.o mmu/mmu.o mmu/page_track.o \
- mmu/spte.o
+ mmu/spte.o mmu/rmap.o
ifdef CONFIG_HYPERV
kvm-y += kvm_onhyperv.o
diff --git a/arch/x86/kvm/debugfs.c b/arch/x86/kvm/debugfs.c
index c1390357126a..29f692ecd6f3 100644
--- a/arch/x86/kvm/debugfs.c
+++ b/arch/x86/kvm/debugfs.c
@@ -9,6 +9,7 @@
#include "lapic.h"
#include "mmu.h"
#include "mmu/mmu_internal.h"
+#include "mmu/rmap.h"
static int vcpu_get_timer_advance_ns(void *data, u64 *val)
{
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4736d7849c60..90b3735d6064 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -26,6 +26,7 @@
#include "kvm_emulate.h"
#include "cpuid.h"
#include "spte.h"
+#include "rmap.h"
#include <linux/kvm_host.h>
#include <linux/types.h>
@@ -112,24 +113,6 @@ module_param(dbg, bool, 0644);
#include <trace/events/kvm.h>
-/* make pte_list_desc fit well in cache lines */
-#define PTE_LIST_EXT 14
-
-/*
- * Slight optimization of cacheline layout, by putting `more' and `spte_count'
- * at the start; then accessing it will only use one single cacheline for
- * either full (entries==PTE_LIST_EXT) case or entries<=6.
- */
-struct pte_list_desc {
- struct pte_list_desc *more;
- /*
- * Stores number of entries stored in the pte_list_desc. No need to be
- * u64 but just for easier alignment. When PTE_LIST_EXT, means full.
- */
- u64 spte_count;
- u64 *sptes[PTE_LIST_EXT];
-};
-
struct kvm_shadow_walk_iterator {
u64 addr;
hpa_t shadow_addr;
@@ -155,7 +138,6 @@ struct kvm_shadow_walk_iterator {
({ spte = mmu_spte_get_lockless(_walker.sptep); 1; }); \
__shadow_walk_next(&(_walker), spte))
-static struct kmem_cache *pte_list_desc_cache;
struct kmem_cache *mmu_page_header_cache;
static struct percpu_counter kvm_total_used_mmu_pages;
@@ -674,11 +656,6 @@ static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
}
-static void mmu_free_pte_list_desc(struct pte_list_desc *pte_list_desc)
-{
- kmem_cache_free(pte_list_desc_cache, pte_list_desc);
-}
-
static bool sp_has_gptes(struct kvm_mmu_page *sp);
static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
@@ -878,111 +855,6 @@ gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t gfn,
return slot;
}
-/*
- * About rmap_head encoding:
- *
- * If the bit zero of rmap_head->val is clear, then it points to the only spte
- * in this rmap chain. Otherwise, (rmap_head->val & ~1) points to a struct
- * pte_list_desc containing more mappings.
- */
-
-/*
- * Returns the number of pointers in the rmap chain, not counting the new one.
- */
-static int pte_list_add(struct kvm_mmu_memory_cache *cache, u64 *spte,
- struct kvm_rmap_head *rmap_head)
-{
- struct pte_list_desc *desc;
- int count = 0;
-
- if (!rmap_head->val) {
- rmap_printk("%p %llx 0->1\n", spte, *spte);
- rmap_head->val = (unsigned long)spte;
- } else if (!(rmap_head->val & 1)) {
- rmap_printk("%p %llx 1->many\n", spte, *spte);
- desc = kvm_mmu_memory_cache_alloc(cache);
- desc->sptes[0] = (u64 *)rmap_head->val;
- desc->sptes[1] = spte;
- desc->spte_count = 2;
- rmap_head->val = (unsigned long)desc | 1;
- ++count;
- } else {
- rmap_printk("%p %llx many->many\n", spte, *spte);
- desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
- while (desc->spte_count == PTE_LIST_EXT) {
- count += PTE_LIST_EXT;
- if (!desc->more) {
- desc->more = kvm_mmu_memory_cache_alloc(cache);
- desc = desc->more;
- desc->spte_count = 0;
- break;
- }
- desc = desc->more;
- }
- count += desc->spte_count;
- desc->sptes[desc->spte_count++] = spte;
- }
- return count;
-}
-
-static void
-pte_list_desc_remove_entry(struct kvm_rmap_head *rmap_head,
- struct pte_list_desc *desc, int i,
- struct pte_list_desc *prev_desc)
-{
- int j = desc->spte_count - 1;
-
- desc->sptes[i] = desc->sptes[j];
- desc->sptes[j] = NULL;
- desc->spte_count--;
- if (desc->spte_count)
- return;
- if (!prev_desc && !desc->more)
- rmap_head->val = 0;
- else
- if (prev_desc)
- prev_desc->more = desc->more;
- else
- rmap_head->val = (unsigned long)desc->more | 1;
- mmu_free_pte_list_desc(desc);
-}
-
-static void pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head)
-{
- struct pte_list_desc *desc;
- struct pte_list_desc *prev_desc;
- int i;
-
- if (!rmap_head->val) {
- pr_err("%s: %p 0->BUG\n", __func__, spte);
- BUG();
- } else if (!(rmap_head->val & 1)) {
- rmap_printk("%p 1->0\n", spte);
- if ((u64 *)rmap_head->val != spte) {
- pr_err("%s: %p 1->BUG\n", __func__, spte);
- BUG();
- }
- rmap_head->val = 0;
- } else {
- rmap_printk("%p many->many\n", spte);
- desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
- prev_desc = NULL;
- while (desc) {
- for (i = 0; i < desc->spte_count; ++i) {
- if (desc->sptes[i] == spte) {
- pte_list_desc_remove_entry(rmap_head,
- desc, i, prev_desc);
- return;
- }
- }
- prev_desc = desc;
- desc = desc->more;
- }
- pr_err("%s: %p many->many\n", __func__, spte);
- BUG();
- }
-}
-
static void kvm_zap_one_rmap_spte(struct kvm *kvm,
struct kvm_rmap_head *rmap_head, u64 *sptep)
{
@@ -1011,7 +883,7 @@ static bool kvm_zap_all_rmap_sptes(struct kvm *kvm,
for (i = 0; i < desc->spte_count; i++)
mmu_spte_clear_track_bits(kvm, desc->sptes[i]);
next = desc->more;
- mmu_free_pte_list_desc(desc);
+ free_pte_list_desc(desc);
}
out:
/* rmap_head is meaningless now, remember to reset it */
@@ -1019,26 +891,6 @@ static bool kvm_zap_all_rmap_sptes(struct kvm *kvm,
return true;
}
-unsigned int pte_list_count(struct kvm_rmap_head *rmap_head)
-{
- struct pte_list_desc *desc;
- unsigned int count = 0;
-
- if (!rmap_head->val)
- return 0;
- else if (!(rmap_head->val & 1))
- return 1;
-
- desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
-
- while (desc) {
- count += desc->spte_count;
- desc = desc->more;
- }
-
- return count;
-}
-
static struct kvm_rmap_head *gfn_to_rmap(gfn_t gfn, int level,
const struct kvm_memory_slot *slot)
{
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index dbaf6755c5a7..cd1c8f32269d 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -166,7 +166,6 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
int min_level);
void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
u64 start_gfn, u64 pages);
-unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
extern int nx_huge_pages;
static inline bool is_nx_huge_page_enabled(struct kvm *kvm)
diff --git a/arch/x86/kvm/mmu/rmap.c b/arch/x86/kvm/mmu/rmap.c
new file mode 100644
index 000000000000..daa99dee0709
--- /dev/null
+++ b/arch/x86/kvm/mmu/rmap.c
@@ -0,0 +1,141 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "mmu.h"
+#include "mmu_internal.h"
+#include "mmutrace.h"
+#include "rmap.h"
+#include "spte.h"
+
+#include <asm/cmpxchg.h>
+#include <trace/events/kvm.h>
+
+/*
+ * About rmap_head encoding:
+ *
+ * If the bit zero of rmap_head->val is clear, then it points to the only spte
+ * in this rmap chain. Otherwise, (rmap_head->val & ~1) points to a struct
+ * pte_list_desc containing more mappings.
+ */
+
+/*
+ * Returns the number of pointers in the rmap chain, not counting the new one.
+ */
+int pte_list_add(struct kvm_mmu_memory_cache *cache, u64 *spte,
+ struct kvm_rmap_head *rmap_head)
+{
+ struct pte_list_desc *desc;
+ int count = 0;
+
+ if (!rmap_head->val) {
+ rmap_printk("%p %llx 0->1\n", spte, *spte);
+ rmap_head->val = (unsigned long)spte;
+ } else if (!(rmap_head->val & 1)) {
+ rmap_printk("%p %llx 1->many\n", spte, *spte);
+ desc = kvm_mmu_memory_cache_alloc(cache);
+ desc->sptes[0] = (u64 *)rmap_head->val;
+ desc->sptes[1] = spte;
+ desc->spte_count = 2;
+ rmap_head->val = (unsigned long)desc | 1;
+ ++count;
+ } else {
+ rmap_printk("%p %llx many->many\n", spte, *spte);
+ desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
+ while (desc->spte_count == PTE_LIST_EXT) {
+ count += PTE_LIST_EXT;
+ if (!desc->more) {
+ desc->more = kvm_mmu_memory_cache_alloc(cache);
+ desc = desc->more;
+ desc->spte_count = 0;
+ break;
+ }
+ desc = desc->more;
+ }
+ count += desc->spte_count;
+ desc->sptes[desc->spte_count++] = spte;
+ }
+ return count;
+}
+
+void free_pte_list_desc(struct pte_list_desc *pte_list_desc)
+{
+ kmem_cache_free(pte_list_desc_cache, pte_list_desc);
+}
+
+static void
+pte_list_desc_remove_entry(struct kvm_rmap_head *rmap_head,
+ struct pte_list_desc *desc, int i,
+ struct pte_list_desc *prev_desc)
+{
+ int j = desc->spte_count - 1;
+
+ desc->sptes[i] = desc->sptes[j];
+ desc->sptes[j] = NULL;
+ desc->spte_count--;
+ if (desc->spte_count)
+ return;
+ if (!prev_desc && !desc->more)
+ rmap_head->val = 0;
+ else
+ if (prev_desc)
+ prev_desc->more = desc->more;
+ else
+ rmap_head->val = (unsigned long)desc->more | 1;
+ free_pte_list_desc(desc);
+}
+
+void pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head)
+{
+ struct pte_list_desc *desc;
+ struct pte_list_desc *prev_desc;
+ int i;
+
+ if (!rmap_head->val) {
+ pr_err("%s: %p 0->BUG\n", __func__, spte);
+ BUG();
+ } else if (!(rmap_head->val & 1)) {
+ rmap_printk("%p 1->0\n", spte);
+ if ((u64 *)rmap_head->val != spte) {
+ pr_err("%s: %p 1->BUG\n", __func__, spte);
+ BUG();
+ }
+ rmap_head->val = 0;
+ } else {
+ rmap_printk("%p many->many\n", spte);
+ desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
+ prev_desc = NULL;
+ while (desc) {
+ for (i = 0; i < desc->spte_count; ++i) {
+ if (desc->sptes[i] == spte) {
+ pte_list_desc_remove_entry(rmap_head,
+ desc, i, prev_desc);
+ return;
+ }
+ }
+ prev_desc = desc;
+ desc = desc->more;
+ }
+ pr_err("%s: %p many->many\n", __func__, spte);
+ BUG();
+ }
+}
+
+unsigned int pte_list_count(struct kvm_rmap_head *rmap_head)
+{
+ struct pte_list_desc *desc;
+ unsigned int count = 0;
+
+ if (!rmap_head->val)
+ return 0;
+ else if (!(rmap_head->val & 1))
+ return 1;
+
+ desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
+
+ while (desc) {
+ count += desc->spte_count;
+ desc = desc->more;
+ }
+
+ return count;
+}
+
diff --git a/arch/x86/kvm/mmu/rmap.h b/arch/x86/kvm/mmu/rmap.h
new file mode 100644
index 000000000000..059765b6e066
--- /dev/null
+++ b/arch/x86/kvm/mmu/rmap.h
@@ -0,0 +1,34 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#ifndef __KVM_X86_MMU_RMAP_H
+#define __KVM_X86_MMU_RMAP_H
+
+#include <linux/kvm_host.h>
+
+/* make pte_list_desc fit well in cache lines */
+#define PTE_LIST_EXT 14
+
+/*
+ * Slight optimization of cacheline layout, by putting `more' and `spte_count'
+ * at the start; then accessing it will only use one single cacheline for
+ * either full (entries==PTE_LIST_EXT) case or entries<=6.
+ */
+struct pte_list_desc {
+ struct pte_list_desc *more;
+ /*
+ * Stores number of entries stored in the pte_list_desc. No need to be
+ * u64 but just for easier alignment. When PTE_LIST_EXT, means full.
+ */
+ u64 spte_count;
+ u64 *sptes[PTE_LIST_EXT];
+};
+
+static struct kmem_cache *pte_list_desc_cache;
+
+int pte_list_add(struct kvm_mmu_memory_cache *cache, u64 *spte,
+ struct kvm_rmap_head *rmap_head);
+void free_pte_list_desc(struct pte_list_desc *pte_list_desc);
+void pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head);
+unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
+
+#endif /* __KVM_X86_MMU_RMAP_H */
--
2.39.0.rc0.267.gcb52ba06e7-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 2/7] KVM: x86/MMU: Move rmap_iterator to rmap.h
2022-12-06 17:35 [PATCH 0/7] KVM: x86/MMU: Factor rmap operations out of mmu.c Ben Gardon
2022-12-06 17:35 ` [PATCH 1/7] KVM: x86/MMU: Move pte_list operations to rmap.c Ben Gardon
@ 2022-12-06 17:35 ` Ben Gardon
2022-12-09 23:04 ` David Matlack
2022-12-06 17:35 ` [PATCH 3/7] KVM: x86/MMU: Move gfn_to_rmap() to rmap.c Ben Gardon
` (5 subsequent siblings)
7 siblings, 1 reply; 21+ messages in thread
From: Ben Gardon @ 2022-12-06 17:35 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, David Matlack,
Vipin Sharma, Ben Gardon
In continuing to factor the rmap out of mmu.c, move the rmap_iterator
and associated functions and macros into rmap.(c|h).
No functional change intended.
Signed-off-by: Ben Gardon <bgardon@google.com>
---
arch/x86/kvm/mmu/mmu.c | 76 -----------------------------------------
arch/x86/kvm/mmu/rmap.c | 61 +++++++++++++++++++++++++++++++++
arch/x86/kvm/mmu/rmap.h | 18 ++++++++++
3 files changed, 79 insertions(+), 76 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 90b3735d6064..c3a7f443a213 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -932,82 +932,6 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
pte_list_remove(spte, rmap_head);
}
-/*
- * Used by the following functions to iterate through the sptes linked by a
- * rmap. All fields are private and not assumed to be used outside.
- */
-struct rmap_iterator {
- /* private fields */
- struct pte_list_desc *desc; /* holds the sptep if not NULL */
- int pos; /* index of the sptep */
-};
-
-/*
- * Iteration must be started by this function. This should also be used after
- * removing/dropping sptes from the rmap link because in such cases the
- * information in the iterator may not be valid.
- *
- * Returns sptep if found, NULL otherwise.
- */
-static u64 *rmap_get_first(struct kvm_rmap_head *rmap_head,
- struct rmap_iterator *iter)
-{
- u64 *sptep;
-
- if (!rmap_head->val)
- return NULL;
-
- if (!(rmap_head->val & 1)) {
- iter->desc = NULL;
- sptep = (u64 *)rmap_head->val;
- goto out;
- }
-
- iter->desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
- iter->pos = 0;
- sptep = iter->desc->sptes[iter->pos];
-out:
- BUG_ON(!is_shadow_present_pte(*sptep));
- return sptep;
-}
-
-/*
- * Must be used with a valid iterator: e.g. after rmap_get_first().
- *
- * Returns sptep if found, NULL otherwise.
- */
-static u64 *rmap_get_next(struct rmap_iterator *iter)
-{
- u64 *sptep;
-
- if (iter->desc) {
- if (iter->pos < PTE_LIST_EXT - 1) {
- ++iter->pos;
- sptep = iter->desc->sptes[iter->pos];
- if (sptep)
- goto out;
- }
-
- iter->desc = iter->desc->more;
-
- if (iter->desc) {
- iter->pos = 0;
- /* desc->sptes[0] cannot be NULL */
- sptep = iter->desc->sptes[iter->pos];
- goto out;
- }
- }
-
- return NULL;
-out:
- BUG_ON(!is_shadow_present_pte(*sptep));
- return sptep;
-}
-
-#define for_each_rmap_spte(_rmap_head_, _iter_, _spte_) \
- for (_spte_ = rmap_get_first(_rmap_head_, _iter_); \
- _spte_; _spte_ = rmap_get_next(_iter_))
-
static void drop_spte(struct kvm *kvm, u64 *sptep)
{
u64 old_spte = mmu_spte_clear_track_bits(kvm, sptep);
diff --git a/arch/x86/kvm/mmu/rmap.c b/arch/x86/kvm/mmu/rmap.c
index daa99dee0709..c3bad366b627 100644
--- a/arch/x86/kvm/mmu/rmap.c
+++ b/arch/x86/kvm/mmu/rmap.c
@@ -139,3 +139,64 @@ unsigned int pte_list_count(struct kvm_rmap_head *rmap_head)
return count;
}
+/*
+ * Iteration must be started by this function. This should also be used after
+ * removing/dropping sptes from the rmap link because in such cases the
+ * information in the iterator may not be valid.
+ *
+ * Returns sptep if found, NULL otherwise.
+ */
+u64 *rmap_get_first(struct kvm_rmap_head *rmap_head, struct rmap_iterator *iter)
+{
+ u64 *sptep;
+
+ if (!rmap_head->val)
+ return NULL;
+
+ if (!(rmap_head->val & 1)) {
+ iter->desc = NULL;
+ sptep = (u64 *)rmap_head->val;
+ goto out;
+ }
+
+ iter->desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
+ iter->pos = 0;
+ sptep = iter->desc->sptes[iter->pos];
+out:
+ BUG_ON(!is_shadow_present_pte(*sptep));
+ return sptep;
+}
+
+/*
+ * Must be used with a valid iterator: e.g. after rmap_get_first().
+ *
+ * Returns sptep if found, NULL otherwise.
+ */
+u64 *rmap_get_next(struct rmap_iterator *iter)
+{
+ u64 *sptep;
+
+ if (iter->desc) {
+ if (iter->pos < PTE_LIST_EXT - 1) {
+ ++iter->pos;
+ sptep = iter->desc->sptes[iter->pos];
+ if (sptep)
+ goto out;
+ }
+
+ iter->desc = iter->desc->more;
+
+ if (iter->desc) {
+ iter->pos = 0;
+ /* desc->sptes[0] cannot be NULL */
+ sptep = iter->desc->sptes[iter->pos];
+ goto out;
+ }
+ }
+
+ return NULL;
+out:
+ BUG_ON(!is_shadow_present_pte(*sptep));
+ return sptep;
+}
+
diff --git a/arch/x86/kvm/mmu/rmap.h b/arch/x86/kvm/mmu/rmap.h
index 059765b6e066..13b265f3a95e 100644
--- a/arch/x86/kvm/mmu/rmap.h
+++ b/arch/x86/kvm/mmu/rmap.h
@@ -31,4 +31,22 @@ void free_pte_list_desc(struct pte_list_desc *pte_list_desc);
void pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head);
unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
+/*
+ * Used by the following functions to iterate through the sptes linked by a
+ * rmap. All fields are private and not assumed to be used outside.
+ */
+struct rmap_iterator {
+ /* private fields */
+ struct pte_list_desc *desc; /* holds the sptep if not NULL */
+ int pos; /* index of the sptep */
+};
+
+u64 *rmap_get_first(struct kvm_rmap_head *rmap_head,
+ struct rmap_iterator *iter);
+u64 *rmap_get_next(struct rmap_iterator *iter);
+
+#define for_each_rmap_spte(_rmap_head_, _iter_, _spte_) \
+ for (_spte_ = rmap_get_first(_rmap_head_, _iter_); \
+ _spte_; _spte_ = rmap_get_next(_iter_))
+
#endif /* __KVM_X86_MMU_RMAP_H */
--
2.39.0.rc0.267.gcb52ba06e7-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 3/7] KVM: x86/MMU: Move gfn_to_rmap() to rmap.c
2022-12-06 17:35 [PATCH 0/7] KVM: x86/MMU: Factor rmap operations out of mmu.c Ben Gardon
2022-12-06 17:35 ` [PATCH 1/7] KVM: x86/MMU: Move pte_list operations to rmap.c Ben Gardon
2022-12-06 17:35 ` [PATCH 2/7] KVM: x86/MMU: Move rmap_iterator to rmap.h Ben Gardon
@ 2022-12-06 17:35 ` Ben Gardon
2022-12-09 23:32 ` David Matlack
2022-12-06 17:35 ` [PATCH 4/7] KVM: x86/MMU: Move rmap_can_add() and rmap_remove() " Ben Gardon
` (4 subsequent siblings)
7 siblings, 1 reply; 21+ messages in thread
From: Ben Gardon @ 2022-12-06 17:35 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, David Matlack,
Vipin Sharma, Ben Gardon
Move gfn_to_rmap() to rmap.c. While the function is not part of
manipulating the rmap, it is the main way that the MMU gets pointers to
the rmaps.
No functional change intended.
Signed-off-by: Ben Gardon <bgardon@google.com>
---
arch/x86/kvm/mmu/mmu.c | 9 ---------
arch/x86/kvm/mmu/rmap.c | 8 ++++++++
arch/x86/kvm/mmu/rmap.h | 2 ++
3 files changed, 10 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c3a7f443a213..f8d7201210c8 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -891,15 +891,6 @@ static bool kvm_zap_all_rmap_sptes(struct kvm *kvm,
return true;
}
-static struct kvm_rmap_head *gfn_to_rmap(gfn_t gfn, int level,
- const struct kvm_memory_slot *slot)
-{
- unsigned long idx;
-
- idx = gfn_to_index(gfn, slot->base_gfn, level);
- return &slot->arch.rmap[level - PG_LEVEL_4K][idx];
-}
-
static bool rmap_can_add(struct kvm_vcpu *vcpu)
{
struct kvm_mmu_memory_cache *mc;
diff --git a/arch/x86/kvm/mmu/rmap.c b/arch/x86/kvm/mmu/rmap.c
index c3bad366b627..272e89147d96 100644
--- a/arch/x86/kvm/mmu/rmap.c
+++ b/arch/x86/kvm/mmu/rmap.c
@@ -200,3 +200,11 @@ u64 *rmap_get_next(struct rmap_iterator *iter)
return sptep;
}
+struct kvm_rmap_head *gfn_to_rmap(gfn_t gfn, int level,
+ const struct kvm_memory_slot *slot)
+{
+ unsigned long idx;
+
+ idx = gfn_to_index(gfn, slot->base_gfn, level);
+ return &slot->arch.rmap[level - PG_LEVEL_4K][idx];
+}
diff --git a/arch/x86/kvm/mmu/rmap.h b/arch/x86/kvm/mmu/rmap.h
index 13b265f3a95e..45732eda57e5 100644
--- a/arch/x86/kvm/mmu/rmap.h
+++ b/arch/x86/kvm/mmu/rmap.h
@@ -49,4 +49,6 @@ u64 *rmap_get_next(struct rmap_iterator *iter);
for (_spte_ = rmap_get_first(_rmap_head_, _iter_); \
_spte_; _spte_ = rmap_get_next(_iter_))
+struct kvm_rmap_head *gfn_to_rmap(gfn_t gfn, int level,
+ const struct kvm_memory_slot *slot);
#endif /* __KVM_X86_MMU_RMAP_H */
--
2.39.0.rc0.267.gcb52ba06e7-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 4/7] KVM: x86/MMU: Move rmap_can_add() and rmap_remove() to rmap.c
2022-12-06 17:35 [PATCH 0/7] KVM: x86/MMU: Factor rmap operations out of mmu.c Ben Gardon
` (2 preceding siblings ...)
2022-12-06 17:35 ` [PATCH 3/7] KVM: x86/MMU: Move gfn_to_rmap() to rmap.c Ben Gardon
@ 2022-12-06 17:35 ` Ben Gardon
2022-12-06 17:35 ` [PATCH 5/7] KVM: x86/MMU: Move the rmap walk iterator out of mmu.c Ben Gardon
` (3 subsequent siblings)
7 siblings, 0 replies; 21+ messages in thread
From: Ben Gardon @ 2022-12-06 17:35 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, David Matlack,
Vipin Sharma, Ben Gardon
Move the functions to check if an entry can be added to an rmap and for
removing elements from an rmap to rmap.(c|h).
No functional change intended.
Signed-off-by: Ben Gardon <bgardon@google.com>
---
arch/x86/kvm/mmu/mmu.c | 34 +--------------------------------
arch/x86/kvm/mmu/mmu_internal.h | 1 +
arch/x86/kvm/mmu/rmap.c | 32 +++++++++++++++++++++++++++++++
arch/x86/kvm/mmu/rmap.h | 3 +++
4 files changed, 37 insertions(+), 33 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f8d7201210c8..52e487d89d54 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -658,7 +658,7 @@ static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
static bool sp_has_gptes(struct kvm_mmu_page *sp);
-static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
+gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
{
if (sp->role.passthrough)
return sp->gfn;
@@ -891,38 +891,6 @@ static bool kvm_zap_all_rmap_sptes(struct kvm *kvm,
return true;
}
-static bool rmap_can_add(struct kvm_vcpu *vcpu)
-{
- struct kvm_mmu_memory_cache *mc;
-
- mc = &vcpu->arch.mmu_pte_list_desc_cache;
- return kvm_mmu_memory_cache_nr_free_objects(mc);
-}
-
-static void rmap_remove(struct kvm *kvm, u64 *spte)
-{
- struct kvm_memslots *slots;
- struct kvm_memory_slot *slot;
- struct kvm_mmu_page *sp;
- gfn_t gfn;
- struct kvm_rmap_head *rmap_head;
-
- sp = sptep_to_sp(spte);
- gfn = kvm_mmu_page_get_gfn(sp, spte_index(spte));
-
- /*
- * Unlike rmap_add, rmap_remove does not run in the context of a vCPU
- * so we have to determine which memslots to use based on context
- * information in sp->role.
- */
- slots = kvm_memslots_for_spte_role(kvm, sp->role);
-
- slot = __gfn_to_memslot(slots, gfn);
- rmap_head = gfn_to_rmap(gfn, sp->role.level, slot);
-
- pte_list_remove(spte, rmap_head);
-}
-
static void drop_spte(struct kvm *kvm, u64 *sptep)
{
u64 old_spte = mmu_spte_clear_track_bits(kvm, sptep);
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index cd1c8f32269d..3de703c2a5d4 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -318,4 +318,5 @@ void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);
void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp);
void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp);
+gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index);
#endif /* __KVM_X86_MMU_INTERNAL_H */
diff --git a/arch/x86/kvm/mmu/rmap.c b/arch/x86/kvm/mmu/rmap.c
index 272e89147d96..6833676aa9ea 100644
--- a/arch/x86/kvm/mmu/rmap.c
+++ b/arch/x86/kvm/mmu/rmap.c
@@ -208,3 +208,35 @@ struct kvm_rmap_head *gfn_to_rmap(gfn_t gfn, int level,
idx = gfn_to_index(gfn, slot->base_gfn, level);
return &slot->arch.rmap[level - PG_LEVEL_4K][idx];
}
+
+bool rmap_can_add(struct kvm_vcpu *vcpu)
+{
+ struct kvm_mmu_memory_cache *mc;
+
+ mc = &vcpu->arch.mmu_pte_list_desc_cache;
+ return kvm_mmu_memory_cache_nr_free_objects(mc);
+}
+
+void rmap_remove(struct kvm *kvm, u64 *spte)
+{
+ struct kvm_memslots *slots;
+ struct kvm_memory_slot *slot;
+ struct kvm_mmu_page *sp;
+ gfn_t gfn;
+ struct kvm_rmap_head *rmap_head;
+
+ sp = sptep_to_sp(spte);
+ gfn = kvm_mmu_page_get_gfn(sp, spte_index(spte));
+
+ /*
+ * Unlike rmap_add, rmap_remove does not run in the context of a vCPU
+ * so we have to determine which memslots to use based on context
+ * information in sp->role.
+ */
+ slots = kvm_memslots_for_spte_role(kvm, sp->role);
+
+ slot = __gfn_to_memslot(slots, gfn);
+ rmap_head = gfn_to_rmap(gfn, sp->role.level, slot);
+
+ pte_list_remove(spte, rmap_head);
+}
diff --git a/arch/x86/kvm/mmu/rmap.h b/arch/x86/kvm/mmu/rmap.h
index 45732eda57e5..81df186ba3c3 100644
--- a/arch/x86/kvm/mmu/rmap.h
+++ b/arch/x86/kvm/mmu/rmap.h
@@ -51,4 +51,7 @@ u64 *rmap_get_next(struct rmap_iterator *iter);
struct kvm_rmap_head *gfn_to_rmap(gfn_t gfn, int level,
const struct kvm_memory_slot *slot);
+
+bool rmap_can_add(struct kvm_vcpu *vcpu);
+void rmap_remove(struct kvm *kvm, u64 *spte);
#endif /* __KVM_X86_MMU_RMAP_H */
--
2.39.0.rc0.267.gcb52ba06e7-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 5/7] KVM: x86/MMU: Move the rmap walk iterator out of mmu.c
2022-12-06 17:35 [PATCH 0/7] KVM: x86/MMU: Factor rmap operations out of mmu.c Ben Gardon
` (3 preceding siblings ...)
2022-12-06 17:35 ` [PATCH 4/7] KVM: x86/MMU: Move rmap_can_add() and rmap_remove() " Ben Gardon
@ 2022-12-06 17:35 ` Ben Gardon
2022-12-06 17:36 ` [PATCH 6/7] KVM: x86/MMU: Move rmap zap operations to rmap.c Ben Gardon
` (2 subsequent siblings)
7 siblings, 0 replies; 21+ messages in thread
From: Ben Gardon @ 2022-12-06 17:35 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, David Matlack,
Vipin Sharma, Ben Gardon
Move slot_rmap_walk_iterator and its associated functions out of mmu.c
to rmap.(c|h).
No functional change intended.
Signed-off-by: Ben Gardon <bgardon@google.com>
---
arch/x86/kvm/mmu/mmu.c | 73 -----------------------------------------
arch/x86/kvm/mmu/rmap.c | 43 ++++++++++++++++++++++++
arch/x86/kvm/mmu/rmap.h | 36 ++++++++++++++++++++
3 files changed, 79 insertions(+), 73 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 52e487d89d54..88da2abc2375 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1198,79 +1198,6 @@ static bool kvm_set_pte_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
return need_flush;
}
-struct slot_rmap_walk_iterator {
- /* input fields. */
- const struct kvm_memory_slot *slot;
- gfn_t start_gfn;
- gfn_t end_gfn;
- int start_level;
- int end_level;
-
- /* output fields. */
- gfn_t gfn;
- struct kvm_rmap_head *rmap;
- int level;
-
- /* private field. */
- struct kvm_rmap_head *end_rmap;
-};
-
-static void
-rmap_walk_init_level(struct slot_rmap_walk_iterator *iterator, int level)
-{
- iterator->level = level;
- iterator->gfn = iterator->start_gfn;
- iterator->rmap = gfn_to_rmap(iterator->gfn, level, iterator->slot);
- iterator->end_rmap = gfn_to_rmap(iterator->end_gfn, level, iterator->slot);
-}
-
-static void
-slot_rmap_walk_init(struct slot_rmap_walk_iterator *iterator,
- const struct kvm_memory_slot *slot, int start_level,
- int end_level, gfn_t start_gfn, gfn_t end_gfn)
-{
- iterator->slot = slot;
- iterator->start_level = start_level;
- iterator->end_level = end_level;
- iterator->start_gfn = start_gfn;
- iterator->end_gfn = end_gfn;
-
- rmap_walk_init_level(iterator, iterator->start_level);
-}
-
-static bool slot_rmap_walk_okay(struct slot_rmap_walk_iterator *iterator)
-{
- return !!iterator->rmap;
-}
-
-static void slot_rmap_walk_next(struct slot_rmap_walk_iterator *iterator)
-{
- while (++iterator->rmap <= iterator->end_rmap) {
- iterator->gfn += (1UL << KVM_HPAGE_GFN_SHIFT(iterator->level));
-
- if (iterator->rmap->val)
- return;
- }
-
- if (++iterator->level > iterator->end_level) {
- iterator->rmap = NULL;
- return;
- }
-
- rmap_walk_init_level(iterator, iterator->level);
-}
-
-#define for_each_slot_rmap_range(_slot_, _start_level_, _end_level_, \
- _start_gfn, _end_gfn, _iter_) \
- for (slot_rmap_walk_init(_iter_, _slot_, _start_level_, \
- _end_level_, _start_gfn, _end_gfn); \
- slot_rmap_walk_okay(_iter_); \
- slot_rmap_walk_next(_iter_))
-
-typedef bool (*rmap_handler_t)(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
- struct kvm_memory_slot *slot, gfn_t gfn,
- int level, pte_t pte);
-
static __always_inline bool kvm_handle_gfn_range(struct kvm *kvm,
struct kvm_gfn_range *range,
rmap_handler_t handler)
diff --git a/arch/x86/kvm/mmu/rmap.c b/arch/x86/kvm/mmu/rmap.c
index 6833676aa9ea..91af5b32cffb 100644
--- a/arch/x86/kvm/mmu/rmap.c
+++ b/arch/x86/kvm/mmu/rmap.c
@@ -240,3 +240,46 @@ void rmap_remove(struct kvm *kvm, u64 *spte)
pte_list_remove(spte, rmap_head);
}
+
+void rmap_walk_init_level(struct slot_rmap_walk_iterator *iterator, int level)
+{
+ iterator->level = level;
+ iterator->gfn = iterator->start_gfn;
+ iterator->rmap = gfn_to_rmap(iterator->gfn, level, iterator->slot);
+ iterator->end_rmap = gfn_to_rmap(iterator->end_gfn, level, iterator->slot);
+}
+
+void slot_rmap_walk_init(struct slot_rmap_walk_iterator *iterator,
+ const struct kvm_memory_slot *slot, int start_level,
+ int end_level, gfn_t start_gfn, gfn_t end_gfn)
+{
+ iterator->slot = slot;
+ iterator->start_level = start_level;
+ iterator->end_level = end_level;
+ iterator->start_gfn = start_gfn;
+ iterator->end_gfn = end_gfn;
+
+ rmap_walk_init_level(iterator, iterator->start_level);
+}
+
+bool slot_rmap_walk_okay(struct slot_rmap_walk_iterator *iterator)
+{
+ return !!iterator->rmap;
+}
+
+void slot_rmap_walk_next(struct slot_rmap_walk_iterator *iterator)
+{
+ while (++iterator->rmap <= iterator->end_rmap) {
+ iterator->gfn += (1UL << KVM_HPAGE_GFN_SHIFT(iterator->level));
+
+ if (iterator->rmap->val)
+ return;
+ }
+
+ if (++iterator->level > iterator->end_level) {
+ iterator->rmap = NULL;
+ return;
+ }
+
+ rmap_walk_init_level(iterator, iterator->level);
+}
diff --git a/arch/x86/kvm/mmu/rmap.h b/arch/x86/kvm/mmu/rmap.h
index 81df186ba3c3..dc4bf7e609ec 100644
--- a/arch/x86/kvm/mmu/rmap.h
+++ b/arch/x86/kvm/mmu/rmap.h
@@ -54,4 +54,40 @@ struct kvm_rmap_head *gfn_to_rmap(gfn_t gfn, int level,
bool rmap_can_add(struct kvm_vcpu *vcpu);
void rmap_remove(struct kvm *kvm, u64 *spte);
+
+struct slot_rmap_walk_iterator {
+ /* input fields. */
+ const struct kvm_memory_slot *slot;
+ gfn_t start_gfn;
+ gfn_t end_gfn;
+ int start_level;
+ int end_level;
+
+ /* output fields. */
+ gfn_t gfn;
+ struct kvm_rmap_head *rmap;
+ int level;
+
+ /* private field. */
+ struct kvm_rmap_head *end_rmap;
+};
+
+void rmap_walk_init_level(struct slot_rmap_walk_iterator *iterator, int level);
+void slot_rmap_walk_init(struct slot_rmap_walk_iterator *iterator,
+ const struct kvm_memory_slot *slot, int start_level,
+ int end_level, gfn_t start_gfn, gfn_t end_gfn);
+bool slot_rmap_walk_okay(struct slot_rmap_walk_iterator *iterator);
+void slot_rmap_walk_next(struct slot_rmap_walk_iterator *iterator);
+
+#define for_each_slot_rmap_range(_slot_, _start_level_, _end_level_, \
+ _start_gfn, _end_gfn, _iter_) \
+ for (slot_rmap_walk_init(_iter_, _slot_, _start_level_, \
+ _end_level_, _start_gfn, _end_gfn); \
+ slot_rmap_walk_okay(_iter_); \
+ slot_rmap_walk_next(_iter_))
+
+typedef bool (*rmap_handler_t)(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+ struct kvm_memory_slot *slot, gfn_t gfn,
+ int level, pte_t pte);
+
#endif /* __KVM_X86_MMU_RMAP_H */
--
2.39.0.rc0.267.gcb52ba06e7-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 6/7] KVM: x86/MMU: Move rmap zap operations to rmap.c
2022-12-06 17:35 [PATCH 0/7] KVM: x86/MMU: Factor rmap operations out of mmu.c Ben Gardon
` (4 preceding siblings ...)
2022-12-06 17:35 ` [PATCH 5/7] KVM: x86/MMU: Move the rmap walk iterator out of mmu.c Ben Gardon
@ 2022-12-06 17:36 ` Ben Gardon
2022-12-09 22:44 ` David Matlack
2022-12-06 17:36 ` [PATCH 7/7] KVM: x86/MMU: Move rmap_add() " Ben Gardon
2022-12-09 23:14 ` [PATCH 0/7] KVM: x86/MMU: Factor rmap operations out of mmu.c David Matlack
7 siblings, 1 reply; 21+ messages in thread
From: Ben Gardon @ 2022-12-06 17:36 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, David Matlack,
Vipin Sharma, Ben Gardon
Move the various rmap zap functions to rmap.c. These functions are less
"pure" rmap operations in that they also contain some SPTE manipulation,
however they're mostly about rmap / pte list manipulation.
No functional change intended.
Signed-off-by: Ben Gardon <bgardon@google.com>
---
arch/x86/kvm/mmu/mmu.c | 51 +--------------------------------
arch/x86/kvm/mmu/mmu_internal.h | 1 +
arch/x86/kvm/mmu/rmap.c | 50 +++++++++++++++++++++++++++++++-
arch/x86/kvm/mmu/rmap.h | 9 +++++-
4 files changed, 59 insertions(+), 52 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 88da2abc2375..12082314d82d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -512,7 +512,7 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
* state bits, it is used to clear the last level sptep.
* Returns the old PTE.
*/
-static u64 mmu_spte_clear_track_bits(struct kvm *kvm, u64 *sptep)
+u64 mmu_spte_clear_track_bits(struct kvm *kvm, u64 *sptep)
{
kvm_pfn_t pfn;
u64 old_spte = *sptep;
@@ -855,42 +855,6 @@ gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t gfn,
return slot;
}
-static void kvm_zap_one_rmap_spte(struct kvm *kvm,
- struct kvm_rmap_head *rmap_head, u64 *sptep)
-{
- mmu_spte_clear_track_bits(kvm, sptep);
- pte_list_remove(sptep, rmap_head);
-}
-
-/* Return true if at least one SPTE was zapped, false otherwise */
-static bool kvm_zap_all_rmap_sptes(struct kvm *kvm,
- struct kvm_rmap_head *rmap_head)
-{
- struct pte_list_desc *desc, *next;
- int i;
-
- if (!rmap_head->val)
- return false;
-
- if (!(rmap_head->val & 1)) {
- mmu_spte_clear_track_bits(kvm, (u64 *)rmap_head->val);
- goto out;
- }
-
- desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
-
- for (; desc; desc = next) {
- for (i = 0; i < desc->spte_count; i++)
- mmu_spte_clear_track_bits(kvm, desc->sptes[i]);
- next = desc->more;
- free_pte_list_desc(desc);
- }
-out:
- /* rmap_head is meaningless now, remember to reset it */
- rmap_head->val = 0;
- return true;
-}
-
static void drop_spte(struct kvm *kvm, u64 *sptep)
{
u64 old_spte = mmu_spte_clear_track_bits(kvm, sptep);
@@ -1145,19 +1109,6 @@ static bool kvm_vcpu_write_protect_gfn(struct kvm_vcpu *vcpu, u64 gfn)
return kvm_mmu_slot_gfn_write_protect(vcpu->kvm, slot, gfn, PG_LEVEL_4K);
}
-static bool __kvm_zap_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
- const struct kvm_memory_slot *slot)
-{
- return kvm_zap_all_rmap_sptes(kvm, rmap_head);
-}
-
-static bool kvm_zap_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
- struct kvm_memory_slot *slot, gfn_t gfn, int level,
- pte_t unused)
-{
- return __kvm_zap_rmap(kvm, rmap_head, slot);
-}
-
static bool kvm_set_pte_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
struct kvm_memory_slot *slot, gfn_t gfn, int level,
pte_t pte)
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 3de703c2a5d4..a219c8e556e9 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -319,4 +319,5 @@ void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp);
void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp);
gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index);
+u64 mmu_spte_clear_track_bits(struct kvm *kvm, u64 *sptep);
#endif /* __KVM_X86_MMU_INTERNAL_H */
diff --git a/arch/x86/kvm/mmu/rmap.c b/arch/x86/kvm/mmu/rmap.c
index 91af5b32cffb..9cc4252aaabb 100644
--- a/arch/x86/kvm/mmu/rmap.c
+++ b/arch/x86/kvm/mmu/rmap.c
@@ -56,7 +56,7 @@ int pte_list_add(struct kvm_mmu_memory_cache *cache, u64 *spte,
return count;
}
-void free_pte_list_desc(struct pte_list_desc *pte_list_desc)
+static void free_pte_list_desc(struct pte_list_desc *pte_list_desc)
{
kmem_cache_free(pte_list_desc_cache, pte_list_desc);
}
@@ -283,3 +283,51 @@ void slot_rmap_walk_next(struct slot_rmap_walk_iterator *iterator)
rmap_walk_init_level(iterator, iterator->level);
}
+
+void kvm_zap_one_rmap_spte(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+ u64 *sptep)
+{
+ mmu_spte_clear_track_bits(kvm, sptep);
+ pte_list_remove(sptep, rmap_head);
+}
+
+/* Return true if at least one SPTE was zapped, false otherwise */
+bool kvm_zap_all_rmap_sptes(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
+{
+ struct pte_list_desc *desc, *next;
+ int i;
+
+ if (!rmap_head->val)
+ return false;
+
+ if (!(rmap_head->val & 1)) {
+ mmu_spte_clear_track_bits(kvm, (u64 *)rmap_head->val);
+ goto out;
+ }
+
+ desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
+
+ for (; desc; desc = next) {
+ for (i = 0; i < desc->spte_count; i++)
+ mmu_spte_clear_track_bits(kvm, desc->sptes[i]);
+ next = desc->more;
+ free_pte_list_desc(desc);
+ }
+out:
+ /* rmap_head is meaningless now, remember to reset it */
+ rmap_head->val = 0;
+ return true;
+}
+
+bool __kvm_zap_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+ const struct kvm_memory_slot *slot)
+{
+ return kvm_zap_all_rmap_sptes(kvm, rmap_head);
+}
+
+bool kvm_zap_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+ struct kvm_memory_slot *slot, gfn_t gfn, int level,
+ pte_t unused)
+{
+ return __kvm_zap_rmap(kvm, rmap_head, slot);
+}
diff --git a/arch/x86/kvm/mmu/rmap.h b/arch/x86/kvm/mmu/rmap.h
index dc4bf7e609ec..a9bf48494e1a 100644
--- a/arch/x86/kvm/mmu/rmap.h
+++ b/arch/x86/kvm/mmu/rmap.h
@@ -27,7 +27,6 @@ static struct kmem_cache *pte_list_desc_cache;
int pte_list_add(struct kvm_mmu_memory_cache *cache, u64 *spte,
struct kvm_rmap_head *rmap_head);
-void free_pte_list_desc(struct pte_list_desc *pte_list_desc);
void pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head);
unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
@@ -90,4 +89,12 @@ typedef bool (*rmap_handler_t)(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
struct kvm_memory_slot *slot, gfn_t gfn,
int level, pte_t pte);
+void kvm_zap_one_rmap_spte(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+ u64 *sptep);
+bool kvm_zap_all_rmap_sptes(struct kvm *kvm, struct kvm_rmap_head *rmap_head);
+bool __kvm_zap_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+ const struct kvm_memory_slot *slot);
+bool kvm_zap_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+ struct kvm_memory_slot *slot, gfn_t gfn, int level,
+ pte_t unused);
#endif /* __KVM_X86_MMU_RMAP_H */
--
2.39.0.rc0.267.gcb52ba06e7-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 7/7] KVM: x86/MMU: Move rmap_add() to rmap.c
2022-12-06 17:35 [PATCH 0/7] KVM: x86/MMU: Factor rmap operations out of mmu.c Ben Gardon
` (5 preceding siblings ...)
2022-12-06 17:36 ` [PATCH 6/7] KVM: x86/MMU: Move rmap zap operations to rmap.c Ben Gardon
@ 2022-12-06 17:36 ` Ben Gardon
2022-12-09 23:27 ` David Matlack
2022-12-09 23:14 ` [PATCH 0/7] KVM: x86/MMU: Factor rmap operations out of mmu.c David Matlack
7 siblings, 1 reply; 21+ messages in thread
From: Ben Gardon @ 2022-12-06 17:36 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, David Matlack,
Vipin Sharma, Ben Gardon
Move rmap_add() to rmap.c to complete the migration of the various rmap
operations out of mmu.c.
No functional change intended.
Signed-off-by: Ben Gardon <bgardon@google.com>
---
arch/x86/kvm/mmu/mmu.c | 45 ++++-----------------------------
arch/x86/kvm/mmu/mmu_internal.h | 6 +++++
arch/x86/kvm/mmu/rmap.c | 37 ++++++++++++++++++++++++++-
arch/x86/kvm/mmu/rmap.h | 8 +++++-
4 files changed, 54 insertions(+), 42 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 12082314d82d..b122c90a3e5f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -215,13 +215,13 @@ static struct kvm_mmu_role_regs vcpu_to_role_regs(struct kvm_vcpu *vcpu)
return regs;
}
-static inline bool kvm_available_flush_tlb_with_range(void)
+inline bool kvm_available_flush_tlb_with_range(void)
{
return kvm_x86_ops.tlb_remote_flush_with_range;
}
-static void kvm_flush_remote_tlbs_with_range(struct kvm *kvm,
- struct kvm_tlb_range *range)
+void kvm_flush_remote_tlbs_with_range(struct kvm *kvm,
+ struct kvm_tlb_range *range)
{
int ret = -ENOTSUPP;
@@ -695,8 +695,8 @@ static u32 kvm_mmu_page_get_access(struct kvm_mmu_page *sp, int index)
return sp->role.access;
}
-static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
- gfn_t gfn, unsigned int access)
+void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
+ gfn_t gfn, unsigned int access)
{
if (sp_has_gptes(sp)) {
sp->shadowed_translation[index] = (gfn << PAGE_SHIFT) | access;
@@ -1217,41 +1217,6 @@ static bool kvm_test_age_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
return false;
}
-#define RMAP_RECYCLE_THRESHOLD 1000
-
-static void __rmap_add(struct kvm *kvm,
- struct kvm_mmu_memory_cache *cache,
- const struct kvm_memory_slot *slot,
- u64 *spte, gfn_t gfn, unsigned int access)
-{
- struct kvm_mmu_page *sp;
- struct kvm_rmap_head *rmap_head;
- int rmap_count;
-
- sp = sptep_to_sp(spte);
- kvm_mmu_page_set_translation(sp, spte_index(spte), gfn, access);
- kvm_update_page_stats(kvm, sp->role.level, 1);
-
- rmap_head = gfn_to_rmap(gfn, sp->role.level, slot);
- rmap_count = pte_list_add(cache, spte, rmap_head);
-
- if (rmap_count > kvm->stat.max_mmu_rmap_size)
- kvm->stat.max_mmu_rmap_size = rmap_count;
- if (rmap_count > RMAP_RECYCLE_THRESHOLD) {
- kvm_zap_all_rmap_sptes(kvm, rmap_head);
- kvm_flush_remote_tlbs_with_address(
- kvm, sp->gfn, KVM_PAGES_PER_HPAGE(sp->role.level));
- }
-}
-
-static void rmap_add(struct kvm_vcpu *vcpu, const struct kvm_memory_slot *slot,
- u64 *spte, gfn_t gfn, unsigned int access)
-{
- struct kvm_mmu_memory_cache *cache = &vcpu->arch.mmu_pte_list_desc_cache;
-
- __rmap_add(vcpu->kvm, cache, slot, spte, gfn, access);
-}
-
bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
bool young = false;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index a219c8e556e9..03da1f8b066e 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -320,4 +320,10 @@ void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp);
gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index);
u64 mmu_spte_clear_track_bits(struct kvm *kvm, u64 *sptep);
+void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
+ gfn_t gfn, unsigned int access);
+
+inline bool kvm_available_flush_tlb_with_range(void);
+void kvm_flush_remote_tlbs_with_range(struct kvm *kvm,
+ struct kvm_tlb_range *range);
#endif /* __KVM_X86_MMU_INTERNAL_H */
diff --git a/arch/x86/kvm/mmu/rmap.c b/arch/x86/kvm/mmu/rmap.c
index 9cc4252aaabb..136c5f4f867b 100644
--- a/arch/x86/kvm/mmu/rmap.c
+++ b/arch/x86/kvm/mmu/rmap.c
@@ -292,7 +292,8 @@ void kvm_zap_one_rmap_spte(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
}
/* Return true if at least one SPTE was zapped, false otherwise */
-bool kvm_zap_all_rmap_sptes(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
+static bool kvm_zap_all_rmap_sptes(struct kvm *kvm,
+ struct kvm_rmap_head *rmap_head)
{
struct pte_list_desc *desc, *next;
int i;
@@ -331,3 +332,37 @@ bool kvm_zap_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
{
return __kvm_zap_rmap(kvm, rmap_head, slot);
}
+
+#define RMAP_RECYCLE_THRESHOLD 1000
+
+void __rmap_add(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
+ const struct kvm_memory_slot *slot, u64 *spte, gfn_t gfn,
+ unsigned int access)
+{
+ struct kvm_mmu_page *sp;
+ struct kvm_rmap_head *rmap_head;
+ int rmap_count;
+
+ sp = sptep_to_sp(spte);
+ kvm_mmu_page_set_translation(sp, spte_index(spte), gfn, access);
+ kvm_update_page_stats(kvm, sp->role.level, 1);
+
+ rmap_head = gfn_to_rmap(gfn, sp->role.level, slot);
+ rmap_count = pte_list_add(cache, spte, rmap_head);
+
+ if (rmap_count > kvm->stat.max_mmu_rmap_size)
+ kvm->stat.max_mmu_rmap_size = rmap_count;
+ if (rmap_count > RMAP_RECYCLE_THRESHOLD) {
+ kvm_zap_all_rmap_sptes(kvm, rmap_head);
+ kvm_flush_remote_tlbs_with_address(
+ kvm, sp->gfn, KVM_PAGES_PER_HPAGE(sp->role.level));
+ }
+}
+
+void rmap_add(struct kvm_vcpu *vcpu, const struct kvm_memory_slot *slot,
+ u64 *spte, gfn_t gfn, unsigned int access)
+{
+ struct kvm_mmu_memory_cache *cache = &vcpu->arch.mmu_pte_list_desc_cache;
+
+ __rmap_add(vcpu->kvm, cache, slot, spte, gfn, access);
+}
diff --git a/arch/x86/kvm/mmu/rmap.h b/arch/x86/kvm/mmu/rmap.h
index a9bf48494e1a..b06897dad76a 100644
--- a/arch/x86/kvm/mmu/rmap.h
+++ b/arch/x86/kvm/mmu/rmap.h
@@ -91,10 +91,16 @@ typedef bool (*rmap_handler_t)(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
void kvm_zap_one_rmap_spte(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
u64 *sptep);
-bool kvm_zap_all_rmap_sptes(struct kvm *kvm, struct kvm_rmap_head *rmap_head);
bool __kvm_zap_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
const struct kvm_memory_slot *slot);
bool kvm_zap_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
struct kvm_memory_slot *slot, gfn_t gfn, int level,
pte_t unused);
+
+void __rmap_add(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
+ const struct kvm_memory_slot *slot, u64 *spte, gfn_t gfn,
+ unsigned int access);
+void rmap_add(struct kvm_vcpu *vcpu, const struct kvm_memory_slot *slot,
+ u64 *spte, gfn_t gfn, unsigned int access);
+
#endif /* __KVM_X86_MMU_RMAP_H */
--
2.39.0.rc0.267.gcb52ba06e7-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 1/7] KVM: x86/MMU: Move pte_list operations to rmap.c
2022-12-06 17:35 ` [PATCH 1/7] KVM: x86/MMU: Move pte_list operations to rmap.c Ben Gardon
@ 2022-12-07 22:58 ` Vipin Sharma
2022-12-14 0:11 ` Ben Gardon
2022-12-09 22:22 ` David Matlack
1 sibling, 1 reply; 21+ messages in thread
From: Vipin Sharma @ 2022-12-07 22:58 UTC (permalink / raw)
To: Ben Gardon
Cc: linux-kernel, kvm, Paolo Bonzini, Peter Xu, Sean Christopherson,
David Matlack
On Tue, Dec 6, 2022 at 9:36 AM Ben Gardon <bgardon@google.com> wrote:
>
> In the interest of eventually splitting the Shadow MMU out of mmu.c,
> start by moving some of the operations for manipulating pte_lists out of
> mmu.c and into a new pair of files: rmap.c and rmap.h.
>
> No functional change intended.
>
> Signed-off-by: Ben Gardon <bgardon@google.com>
> ---
> arch/x86/kvm/Makefile | 2 +-
> arch/x86/kvm/debugfs.c | 1 +
> arch/x86/kvm/mmu/mmu.c | 152 +-------------------------------
> arch/x86/kvm/mmu/mmu_internal.h | 1 -
> arch/x86/kvm/mmu/rmap.c | 141 +++++++++++++++++++++++++++++
> arch/x86/kvm/mmu/rmap.h | 34 +++++++
> 6 files changed, 179 insertions(+), 152 deletions(-)
> create mode 100644 arch/x86/kvm/mmu/rmap.c
> create mode 100644 arch/x86/kvm/mmu/rmap.h
>
> diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
> index 80e3fe184d17..9f766eebeddf 100644
> --- a/arch/x86/kvm/Makefile
> +++ b/arch/x86/kvm/Makefile
> @@ -12,7 +12,7 @@ include $(srctree)/virt/kvm/Makefile.kvm
> kvm-y += x86.o emulate.o i8259.o irq.o lapic.o \
> i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \
> hyperv.o debugfs.o mmu/mmu.o mmu/page_track.o \
> - mmu/spte.o
> + mmu/spte.o mmu/rmap.o
>
> ifdef CONFIG_HYPERV
> kvm-y += kvm_onhyperv.o
> diff --git a/arch/x86/kvm/debugfs.c b/arch/x86/kvm/debugfs.c
> index c1390357126a..29f692ecd6f3 100644
> --- a/arch/x86/kvm/debugfs.c
> +++ b/arch/x86/kvm/debugfs.c
> @@ -9,6 +9,7 @@
> #include "lapic.h"
> #include "mmu.h"
> #include "mmu/mmu_internal.h"
> +#include "mmu/rmap.h"
>
> static int vcpu_get_timer_advance_ns(void *data, u64 *val)
> {
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 4736d7849c60..90b3735d6064 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -26,6 +26,7 @@
> #include "kvm_emulate.h"
> #include "cpuid.h"
> #include "spte.h"
> +#include "rmap.h"
>
> #include <linux/kvm_host.h>
> #include <linux/types.h>
> @@ -112,24 +113,6 @@ module_param(dbg, bool, 0644);
>
> #include <trace/events/kvm.h>
>
> -/* make pte_list_desc fit well in cache lines */
> -#define PTE_LIST_EXT 14
> -
> -/*
> - * Slight optimization of cacheline layout, by putting `more' and `spte_count'
> - * at the start; then accessing it will only use one single cacheline for
> - * either full (entries==PTE_LIST_EXT) case or entries<=6.
> - */
> -struct pte_list_desc {
> - struct pte_list_desc *more;
> - /*
> - * Stores number of entries stored in the pte_list_desc. No need to be
> - * u64 but just for easier alignment. When PTE_LIST_EXT, means full.
> - */
> - u64 spte_count;
> - u64 *sptes[PTE_LIST_EXT];
> -};
> -
> struct kvm_shadow_walk_iterator {
> u64 addr;
> hpa_t shadow_addr;
> @@ -155,7 +138,6 @@ struct kvm_shadow_walk_iterator {
> ({ spte = mmu_spte_get_lockless(_walker.sptep); 1; }); \
> __shadow_walk_next(&(_walker), spte))
>
> -static struct kmem_cache *pte_list_desc_cache;
> struct kmem_cache *mmu_page_header_cache;
> static struct percpu_counter kvm_total_used_mmu_pages;
>
> @@ -674,11 +656,6 @@ static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
> kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
> }
>
> -static void mmu_free_pte_list_desc(struct pte_list_desc *pte_list_desc)
> -{
> - kmem_cache_free(pte_list_desc_cache, pte_list_desc);
> -}
> -
> static bool sp_has_gptes(struct kvm_mmu_page *sp);
>
> static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
> @@ -878,111 +855,6 @@ gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t gfn,
> return slot;
> }
>
> -/*
> - * About rmap_head encoding:
> - *
> - * If the bit zero of rmap_head->val is clear, then it points to the only spte
> - * in this rmap chain. Otherwise, (rmap_head->val & ~1) points to a struct
> - * pte_list_desc containing more mappings.
> - */
> -
> -/*
> - * Returns the number of pointers in the rmap chain, not counting the new one.
> - */
> -static int pte_list_add(struct kvm_mmu_memory_cache *cache, u64 *spte,
> - struct kvm_rmap_head *rmap_head)
> -{
> - struct pte_list_desc *desc;
> - int count = 0;
> -
> - if (!rmap_head->val) {
> - rmap_printk("%p %llx 0->1\n", spte, *spte);
> - rmap_head->val = (unsigned long)spte;
> - } else if (!(rmap_head->val & 1)) {
> - rmap_printk("%p %llx 1->many\n", spte, *spte);
> - desc = kvm_mmu_memory_cache_alloc(cache);
> - desc->sptes[0] = (u64 *)rmap_head->val;
> - desc->sptes[1] = spte;
> - desc->spte_count = 2;
> - rmap_head->val = (unsigned long)desc | 1;
> - ++count;
> - } else {
> - rmap_printk("%p %llx many->many\n", spte, *spte);
> - desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
> - while (desc->spte_count == PTE_LIST_EXT) {
> - count += PTE_LIST_EXT;
> - if (!desc->more) {
> - desc->more = kvm_mmu_memory_cache_alloc(cache);
> - desc = desc->more;
> - desc->spte_count = 0;
> - break;
> - }
> - desc = desc->more;
> - }
> - count += desc->spte_count;
> - desc->sptes[desc->spte_count++] = spte;
> - }
> - return count;
> -}
> -
> -static void
> -pte_list_desc_remove_entry(struct kvm_rmap_head *rmap_head,
> - struct pte_list_desc *desc, int i,
> - struct pte_list_desc *prev_desc)
> -{
> - int j = desc->spte_count - 1;
> -
> - desc->sptes[i] = desc->sptes[j];
> - desc->sptes[j] = NULL;
> - desc->spte_count--;
> - if (desc->spte_count)
> - return;
> - if (!prev_desc && !desc->more)
> - rmap_head->val = 0;
> - else
> - if (prev_desc)
> - prev_desc->more = desc->more;
> - else
> - rmap_head->val = (unsigned long)desc->more | 1;
> - mmu_free_pte_list_desc(desc);
> -}
> -
> -static void pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head)
> -{
> - struct pte_list_desc *desc;
> - struct pte_list_desc *prev_desc;
> - int i;
> -
> - if (!rmap_head->val) {
> - pr_err("%s: %p 0->BUG\n", __func__, spte);
> - BUG();
> - } else if (!(rmap_head->val & 1)) {
> - rmap_printk("%p 1->0\n", spte);
> - if ((u64 *)rmap_head->val != spte) {
> - pr_err("%s: %p 1->BUG\n", __func__, spte);
> - BUG();
> - }
> - rmap_head->val = 0;
> - } else {
> - rmap_printk("%p many->many\n", spte);
> - desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
> - prev_desc = NULL;
> - while (desc) {
> - for (i = 0; i < desc->spte_count; ++i) {
> - if (desc->sptes[i] == spte) {
> - pte_list_desc_remove_entry(rmap_head,
> - desc, i, prev_desc);
> - return;
> - }
> - }
> - prev_desc = desc;
> - desc = desc->more;
> - }
> - pr_err("%s: %p many->many\n", __func__, spte);
> - BUG();
> - }
> -}
> -
> static void kvm_zap_one_rmap_spte(struct kvm *kvm,
> struct kvm_rmap_head *rmap_head, u64 *sptep)
> {
> @@ -1011,7 +883,7 @@ static bool kvm_zap_all_rmap_sptes(struct kvm *kvm,
> for (i = 0; i < desc->spte_count; i++)
> mmu_spte_clear_track_bits(kvm, desc->sptes[i]);
> next = desc->more;
> - mmu_free_pte_list_desc(desc);
> + free_pte_list_desc(desc);
> }
> out:
> /* rmap_head is meaningless now, remember to reset it */
> @@ -1019,26 +891,6 @@ static bool kvm_zap_all_rmap_sptes(struct kvm *kvm,
> return true;
> }
>
> -unsigned int pte_list_count(struct kvm_rmap_head *rmap_head)
> -{
> - struct pte_list_desc *desc;
> - unsigned int count = 0;
> -
> - if (!rmap_head->val)
> - return 0;
> - else if (!(rmap_head->val & 1))
> - return 1;
> -
> - desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
> -
> - while (desc) {
> - count += desc->spte_count;
> - desc = desc->more;
> - }
> -
> - return count;
> -}
> -
> static struct kvm_rmap_head *gfn_to_rmap(gfn_t gfn, int level,
> const struct kvm_memory_slot *slot)
> {
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index dbaf6755c5a7..cd1c8f32269d 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -166,7 +166,6 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
> int min_level);
> void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
> u64 start_gfn, u64 pages);
> -unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
>
> extern int nx_huge_pages;
> static inline bool is_nx_huge_page_enabled(struct kvm *kvm)
> diff --git a/arch/x86/kvm/mmu/rmap.c b/arch/x86/kvm/mmu/rmap.c
> new file mode 100644
> index 000000000000..daa99dee0709
> --- /dev/null
> +++ b/arch/x86/kvm/mmu/rmap.c
> @@ -0,0 +1,141 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
A comment would be nice to write expectations from this file and what
code lives here.
> +#include "mmu.h"
> +#include "mmu_internal.h"
> +#include "mmutrace.h"
> +#include "rmap.h"
> +#include "spte.h"
> +
> +#include <asm/cmpxchg.h>
> +#include <trace/events/kvm.h>
> +
> +/*
> + * About rmap_head encoding:
> + *
> + * If the bit zero of rmap_head->val is clear, then it points to the only spte
> + * in this rmap chain. Otherwise, (rmap_head->val & ~1) points to a struct
> + * pte_list_desc containing more mappings.
> + */
> +
> +/*
> + * Returns the number of pointers in the rmap chain, not counting the new one.
> + */
> +int pte_list_add(struct kvm_mmu_memory_cache *cache, u64 *spte,
> + struct kvm_rmap_head *rmap_head)
> +{
> + struct pte_list_desc *desc;
> + int count = 0;
> +
> + if (!rmap_head->val) {
> + rmap_printk("%p %llx 0->1\n", spte, *spte);
> + rmap_head->val = (unsigned long)spte;
> + } else if (!(rmap_head->val & 1)) {
> + rmap_printk("%p %llx 1->many\n", spte, *spte);
> + desc = kvm_mmu_memory_cache_alloc(cache);
> + desc->sptes[0] = (u64 *)rmap_head->val;
> + desc->sptes[1] = spte;
> + desc->spte_count = 2;
> + rmap_head->val = (unsigned long)desc | 1;
> + ++count;
> + } else {
> + rmap_printk("%p %llx many->many\n", spte, *spte);
> + desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
> + while (desc->spte_count == PTE_LIST_EXT) {
> + count += PTE_LIST_EXT;
> + if (!desc->more) {
> + desc->more = kvm_mmu_memory_cache_alloc(cache);
> + desc = desc->more;
> + desc->spte_count = 0;
> + break;
> + }
> + desc = desc->more;
> + }
> + count += desc->spte_count;
> + desc->sptes[desc->spte_count++] = spte;
> + }
> + return count;
> +}
> +
> +void free_pte_list_desc(struct pte_list_desc *pte_list_desc)
> +{
> + kmem_cache_free(pte_list_desc_cache, pte_list_desc);
> +}
> +
> +static void
> +pte_list_desc_remove_entry(struct kvm_rmap_head *rmap_head,
> + struct pte_list_desc *desc, int i,
> + struct pte_list_desc *prev_desc)
> +{
> + int j = desc->spte_count - 1;
> +
> + desc->sptes[i] = desc->sptes[j];
> + desc->sptes[j] = NULL;
> + desc->spte_count--;
> + if (desc->spte_count)
> + return;
> + if (!prev_desc && !desc->more)
> + rmap_head->val = 0;
> + else
> + if (prev_desc)
> + prev_desc->more = desc->more;
> + else
> + rmap_head->val = (unsigned long)desc->more | 1;
> + free_pte_list_desc(desc);
> +}
> +
> +void pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head)
> +{
> + struct pte_list_desc *desc;
> + struct pte_list_desc *prev_desc;
> + int i;
> +
> + if (!rmap_head->val) {
> + pr_err("%s: %p 0->BUG\n", __func__, spte);
> + BUG();
> + } else if (!(rmap_head->val & 1)) {
> + rmap_printk("%p 1->0\n", spte);
> + if ((u64 *)rmap_head->val != spte) {
> + pr_err("%s: %p 1->BUG\n", __func__, spte);
> + BUG();
> + }
> + rmap_head->val = 0;
> + } else {
> + rmap_printk("%p many->many\n", spte);
> + desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
> + prev_desc = NULL;
> + while (desc) {
> + for (i = 0; i < desc->spte_count; ++i) {
> + if (desc->sptes[i] == spte) {
> + pte_list_desc_remove_entry(rmap_head,
> + desc, i, prev_desc);
> + return;
> + }
> + }
> + prev_desc = desc;
> + desc = desc->more;
> + }
> + pr_err("%s: %p many->many\n", __func__, spte);
> + BUG();
> + }
> +}
> +
> +unsigned int pte_list_count(struct kvm_rmap_head *rmap_head)
> +{
> + struct pte_list_desc *desc;
> + unsigned int count = 0;
> +
> + if (!rmap_head->val)
> + return 0;
> + else if (!(rmap_head->val & 1))
> + return 1;
> +
> + desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
> +
> + while (desc) {
> + count += desc->spte_count;
> + desc = desc->more;
> + }
> +
> + return count;
> +}
> +
> diff --git a/arch/x86/kvm/mmu/rmap.h b/arch/x86/kvm/mmu/rmap.h
> new file mode 100644
> index 000000000000..059765b6e066
> --- /dev/null
> +++ b/arch/x86/kvm/mmu/rmap.h
> @@ -0,0 +1,34 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#ifndef __KVM_X86_MMU_RMAP_H
> +#define __KVM_X86_MMU_RMAP_H
> +
> +#include <linux/kvm_host.h>
> +
> +/* make pte_list_desc fit well in cache lines */
> +#define PTE_LIST_EXT 14
> +
> +/*
> + * Slight optimization of cacheline layout, by putting `more' and `spte_count'
> + * at the start; then accessing it will only use one single cacheline for
> + * either full (entries==PTE_LIST_EXT) case or entries<=6.
> + */
> +struct pte_list_desc {
> + struct pte_list_desc *more;
> + /*
> + * Stores number of entries stored in the pte_list_desc. No need to be
> + * u64 but just for easier alignment. When PTE_LIST_EXT, means full.
> + */
> + u64 spte_count;
> + u64 *sptes[PTE_LIST_EXT];
> +};
> +
> +static struct kmem_cache *pte_list_desc_cache;
Does it make sense to make it non static and extern here. Also, you
can provide an init function which can be called from mmu.c?
> +
> +int pte_list_add(struct kvm_mmu_memory_cache *cache, u64 *spte,
> + struct kvm_rmap_head *rmap_head);
> +void free_pte_list_desc(struct pte_list_desc *pte_list_desc);
> +void pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head);
> +unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
> +
Similar to tdp_mmu, and other rmap functions in next patches in the
series should above functions be prefixed with "rmap_"?
> +#endif /* __KVM_X86_MMU_RMAP_H */
> --
> 2.39.0.rc0.267.gcb52ba06e7-goog
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/7] KVM: x86/MMU: Move pte_list operations to rmap.c
2022-12-06 17:35 ` [PATCH 1/7] KVM: x86/MMU: Move pte_list operations to rmap.c Ben Gardon
2022-12-07 22:58 ` Vipin Sharma
@ 2022-12-09 22:22 ` David Matlack
2022-12-14 0:07 ` Ben Gardon
1 sibling, 1 reply; 21+ messages in thread
From: David Matlack @ 2022-12-09 22:22 UTC (permalink / raw)
To: Ben Gardon
Cc: linux-kernel, kvm, Paolo Bonzini, Peter Xu, Sean Christopherson,
Vipin Sharma
On Tue, Dec 06, 2022 at 05:35:55PM +0000, Ben Gardon wrote:
> In the interest of eventually splitting the Shadow MMU out of mmu.c,
> start by moving some of the operations for manipulating pte_lists out of
> mmu.c and into a new pair of files: rmap.c and rmap.h.
>
> No functional change intended.
>
> Signed-off-by: Ben Gardon <bgardon@google.com>
> ---
[...]
> diff --git a/arch/x86/kvm/mmu/rmap.h b/arch/x86/kvm/mmu/rmap.h
> new file mode 100644
> index 000000000000..059765b6e066
> --- /dev/null
> +++ b/arch/x86/kvm/mmu/rmap.h
> @@ -0,0 +1,34 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#ifndef __KVM_X86_MMU_RMAP_H
> +#define __KVM_X86_MMU_RMAP_H
> +
> +#include <linux/kvm_host.h>
> +
> +/* make pte_list_desc fit well in cache lines */
> +#define PTE_LIST_EXT 14
> +
> +/*
> + * Slight optimization of cacheline layout, by putting `more' and `spte_count'
> + * at the start; then accessing it will only use one single cacheline for
> + * either full (entries==PTE_LIST_EXT) case or entries<=6.
> + */
> +struct pte_list_desc {
> + struct pte_list_desc *more;
> + /*
> + * Stores number of entries stored in the pte_list_desc. No need to be
> + * u64 but just for easier alignment. When PTE_LIST_EXT, means full.
> + */
> + u64 spte_count;
> + u64 *sptes[PTE_LIST_EXT];
> +};
> +
> +static struct kmem_cache *pte_list_desc_cache;
The definition of pte_list_desc_cache needs to go in a C file since it's
a global variable. Since it now needs to be accessed by more than once C
file, drop the static. Then it can be accessed with extern.
Since most of the code that sets up and deals with pte_list_desc_cache
is still in mmu.c, my vote is to keep the definition there.
i.e.
mmu.c:
struct kmem_cache *pte_list_desc_cache;
rmap.c
extern struct kmem_cache *pte_list_desc_cache;
And no need for anything in rmap.h.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 6/7] KVM: x86/MMU: Move rmap zap operations to rmap.c
2022-12-06 17:36 ` [PATCH 6/7] KVM: x86/MMU: Move rmap zap operations to rmap.c Ben Gardon
@ 2022-12-09 22:44 ` David Matlack
0 siblings, 0 replies; 21+ messages in thread
From: David Matlack @ 2022-12-09 22:44 UTC (permalink / raw)
To: Ben Gardon
Cc: linux-kernel, kvm, Paolo Bonzini, Peter Xu, Sean Christopherson,
Vipin Sharma
On Tue, Dec 06, 2022 at 05:36:00PM +0000, Ben Gardon wrote:
> Move the various rmap zap functions to rmap.c. These functions are less
> "pure" rmap operations in that they also contain some SPTE manipulation,
> however they're mostly about rmap / pte list manipulation.
>
> No functional change intended.
>
> Signed-off-by: Ben Gardon <bgardon@google.com>
> ---
[...]
> -static void kvm_zap_one_rmap_spte(struct kvm *kvm,
> - struct kvm_rmap_head *rmap_head, u64 *sptep)
> -{
> - mmu_spte_clear_track_bits(kvm, sptep);
> - pte_list_remove(sptep, rmap_head);
> -}
> -
> -/* Return true if at least one SPTE was zapped, false otherwise */
> -static bool kvm_zap_all_rmap_sptes(struct kvm *kvm,
> - struct kvm_rmap_head *rmap_head)
> -{
> - struct pte_list_desc *desc, *next;
> - int i;
> -
> - if (!rmap_head->val)
> - return false;
> -
> - if (!(rmap_head->val & 1)) {
> - mmu_spte_clear_track_bits(kvm, (u64 *)rmap_head->val);
> - goto out;
> - }
> -
> - desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
> -
> - for (; desc; desc = next) {
> - for (i = 0; i < desc->spte_count; i++)
> - mmu_spte_clear_track_bits(kvm, desc->sptes[i]);
> - next = desc->more;
> - free_pte_list_desc(desc);
> - }
> -out:
> - /* rmap_head is meaningless now, remember to reset it */
> - rmap_head->val = 0;
> - return true;
> -}
I don't like moving the rmap zap functions into rmap.c, because they are
more mmu.c functions, as you note in the commit description. e.g. It's
odd to have kvm_zap_all_rmap_sptes() in rmap.c but not, say
__rmap_clear_dirty().
I get your point though that kvm_zap_all_rmap_sptes() has to know
intimate details of the pte_list_desc structure. It would be nice to
keep those details isolated to rmap.c.
What about keeping the zap functions mmu.c and just provide a better API
for kvm_zap_all_rmap_sptes() to process the rmap entries?
e.g.
mmu.c:
static bool kvm_zap_all_rmap_sptes(struct kvm *kvm,
struct kvm_rmap_head *rmap_head)
{
struct rmap_iterator iter;
bool flush = false;
u64 *sptep;
for_each_rmap_spte(rmap_head, &iter, sptep) {
mmu_spte_clear_track_bits(kvm, sptep);
flush = true;
}
pte_list_free_all(rmap_head); // <-- implemented in rmap.c
return flush;
}
This should be about as efficient as the current approach (same big-O
notation at least) and maintain the separation of pte_list_desc
internals in rmap.c.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/7] KVM: x86/MMU: Move rmap_iterator to rmap.h
2022-12-06 17:35 ` [PATCH 2/7] KVM: x86/MMU: Move rmap_iterator to rmap.h Ben Gardon
@ 2022-12-09 23:04 ` David Matlack
2022-12-14 0:12 ` Ben Gardon
0 siblings, 1 reply; 21+ messages in thread
From: David Matlack @ 2022-12-09 23:04 UTC (permalink / raw)
To: Ben Gardon
Cc: linux-kernel, kvm, Paolo Bonzini, Peter Xu, Sean Christopherson,
Vipin Sharma
On Tue, Dec 06, 2022 at 05:35:56PM +0000, Ben Gardon wrote:
> In continuing to factor the rmap out of mmu.c, move the rmap_iterator
> and associated functions and macros into rmap.(c|h).
>
> No functional change intended.
>
> Signed-off-by: Ben Gardon <bgardon@google.com>
> ---
> arch/x86/kvm/mmu/mmu.c | 76 -----------------------------------------
> arch/x86/kvm/mmu/rmap.c | 61 +++++++++++++++++++++++++++++++++
> arch/x86/kvm/mmu/rmap.h | 18 ++++++++++
> 3 files changed, 79 insertions(+), 76 deletions(-)
>
[...]
> diff --git a/arch/x86/kvm/mmu/rmap.h b/arch/x86/kvm/mmu/rmap.h
> index 059765b6e066..13b265f3a95e 100644
> --- a/arch/x86/kvm/mmu/rmap.h
> +++ b/arch/x86/kvm/mmu/rmap.h
> @@ -31,4 +31,22 @@ void free_pte_list_desc(struct pte_list_desc *pte_list_desc);
> void pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head);
> unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
>
> +/*
> + * Used by the following functions to iterate through the sptes linked by a
> + * rmap. All fields are private and not assumed to be used outside.
> + */
> +struct rmap_iterator {
> + /* private fields */
> + struct pte_list_desc *desc; /* holds the sptep if not NULL */
> + int pos; /* index of the sptep */
> +};
> +
> +u64 *rmap_get_first(struct kvm_rmap_head *rmap_head,
> + struct rmap_iterator *iter);
> +u64 *rmap_get_next(struct rmap_iterator *iter);
> +
> +#define for_each_rmap_spte(_rmap_head_, _iter_, _spte_) \
> + for (_spte_ = rmap_get_first(_rmap_head_, _iter_); \
> + _spte_; _spte_ = rmap_get_next(_iter_))
> +
I always found these function names and kvm_rmap_head confusing since
they are about iterating through the pte_list_desc data structure. The
rmap (gfn -> list of sptes) is a specific application of the
pte_list_desc structure, but not the only application. There's also
parent_ptes in struct kvm_mmu_page, which is not an rmap, just a plain
old list of ptes.
While you are refactoring this code, what do you think about doing the
following renames?
struct kvm_rmap_head -> struct pte_list_head
struct rmap_iterator -> struct pte_list_iterator
rmap_get_first() -> pte_list_get_first()
rmap_get_next() -> pte_list_get_next()
for_each_rmap_spte() -> for_each_pte_list_entry()
Then we can reserve the term "rmap" just for the actual rmap
(slot->arch.rmap), and code that deals with sp->parent_ptes will become
a lot more clear IMO (because it will not longer mention rmap).
e.g. We go from this:
struct rmap_iterator iter;
u64 *sptep;
for_each_rmap_spte(&sp->parent_ptes, &iter, sptep) {
...
}
To this:
struct pte_list_iterator iter;
u64 *sptep;
for_each_pte_list_entry(&sp->parent_ptes, &iter, sptep) {
...
}
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 0/7] KVM: x86/MMU: Factor rmap operations out of mmu.c
2022-12-06 17:35 [PATCH 0/7] KVM: x86/MMU: Factor rmap operations out of mmu.c Ben Gardon
` (6 preceding siblings ...)
2022-12-06 17:36 ` [PATCH 7/7] KVM: x86/MMU: Move rmap_add() " Ben Gardon
@ 2022-12-09 23:14 ` David Matlack
7 siblings, 0 replies; 21+ messages in thread
From: David Matlack @ 2022-12-09 23:14 UTC (permalink / raw)
To: Ben Gardon
Cc: linux-kernel, kvm, Paolo Bonzini, Peter Xu, Sean Christopherson,
Vipin Sharma
On Tue, Dec 06, 2022 at 05:35:54PM +0000, Ben Gardon wrote:
> Move the basic operations for manipulating an rmap out of mmu.c, into
> their own files.
memslot_rmap_alloc() and memslot_rmap_free() from x86.c look like good
candidates to move to rmap.c as well as they are pure setup/teardown.
>
> This is the first step in a series of series to split the Shadow MMU out of
> mmu.c. Splitting the shadow MMU out results in about a 50% reduction in line
> count of mmu.c, from ~7K to ~3.5K. The rmap operations only represent about
> 10% of the Shadow MMU but are split out first becase the are relatively
> self-contained.
>
> This split may also help facilitate the addition of kUnit tests for rmap
> manipulation, especially the fiddly bit flag which diferentiates a direct
> pointer from a pte_list_desc.
>
> This effort is complimentary to David Matlack's proposal to refactor
> the TDP MMU into an arch-neutral implementation because it further
> clarifies the distinction between the Shadow MMU and TDP MMU within x86.
>
> This series contains no functional changes. It's just a direct movement
> of code from one file to another.
>
> Whether this rmap code should stay in its own file or get merged with
> the rest of the shadow MMU code as it is factored out is open for
> debate.
>
> Ben Gardon (7):
> KVM: x86/MMU: Move pte_list operations to rmap.c
> KVM: x86/MMU: Move rmap_iterator to rmap.h
> KVM: x86/MMU: Move gfn_to_rmap() to rmap.c
> KVM: x86/MMU: Move rmap_can_add() and rmap_remove() to rmap.c
> KVM: x86/MMU: Move the rmap walk iterator out of mmu.c
> KVM: x86/MMU: Move rmap zap operations to rmap.c
> KVM: x86/MMU: Move rmap_add() to rmap.c
>
> arch/x86/kvm/Makefile | 2 +-
> arch/x86/kvm/debugfs.c | 1 +
> arch/x86/kvm/mmu/mmu.c | 438 +-------------------------------
> arch/x86/kvm/mmu/mmu_internal.h | 9 +-
> arch/x86/kvm/mmu/rmap.c | 368 +++++++++++++++++++++++++++
> arch/x86/kvm/mmu/rmap.h | 106 ++++++++
> 6 files changed, 492 insertions(+), 432 deletions(-)
> create mode 100644 arch/x86/kvm/mmu/rmap.c
> create mode 100644 arch/x86/kvm/mmu/rmap.h
>
> --
> 2.39.0.rc0.267.gcb52ba06e7-goog
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 7/7] KVM: x86/MMU: Move rmap_add() to rmap.c
2022-12-06 17:36 ` [PATCH 7/7] KVM: x86/MMU: Move rmap_add() " Ben Gardon
@ 2022-12-09 23:27 ` David Matlack
0 siblings, 0 replies; 21+ messages in thread
From: David Matlack @ 2022-12-09 23:27 UTC (permalink / raw)
To: Ben Gardon
Cc: linux-kernel, kvm, Paolo Bonzini, Peter Xu, Sean Christopherson,
Vipin Sharma
On Tue, Dec 06, 2022 at 05:36:01PM +0000, Ben Gardon wrote:
> Move rmap_add() to rmap.c to complete the migration of the various rmap
> operations out of mmu.c.
IMO rmap_{can_add,add,remove}() should stay in mmu.c since the
implementation of those functions is all Shadow MMU book-keeping that
just needs to be done when the rmap changes.
I would be in favor of giving them more accurate and MMU-related names
though. e.g.
rmap_can_add() -> kvm_vcpu_can_extend_rmap()
rmap_add() -> kvm_mmu_rmap_add()
rmap_remove() -> kvm_mmu_rmap_remove()
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/7] KVM: x86/MMU: Move gfn_to_rmap() to rmap.c
2022-12-06 17:35 ` [PATCH 3/7] KVM: x86/MMU: Move gfn_to_rmap() to rmap.c Ben Gardon
@ 2022-12-09 23:32 ` David Matlack
0 siblings, 0 replies; 21+ messages in thread
From: David Matlack @ 2022-12-09 23:32 UTC (permalink / raw)
To: Ben Gardon
Cc: linux-kernel, kvm, Paolo Bonzini, Peter Xu, Sean Christopherson,
Vipin Sharma
On Tue, Dec 06, 2022 at 05:35:57PM +0000, Ben Gardon wrote:
> Move gfn_to_rmap() to rmap.c. While the function is not part of
> manipulating the rmap, it is the main way that the MMU gets pointers to
> the rmaps.
>
> No functional change intended.
>
> Signed-off-by: Ben Gardon <bgardon@google.com>
> ---
[...]
> diff --git a/arch/x86/kvm/mmu/rmap.c b/arch/x86/kvm/mmu/rmap.c
> index c3bad366b627..272e89147d96 100644
> --- a/arch/x86/kvm/mmu/rmap.c
> +++ b/arch/x86/kvm/mmu/rmap.c
> @@ -200,3 +200,11 @@ u64 *rmap_get_next(struct rmap_iterator *iter)
> return sptep;
> }
>
> +struct kvm_rmap_head *gfn_to_rmap(gfn_t gfn, int level,
> + const struct kvm_memory_slot *slot)
> +{
> + unsigned long idx;
> +
> + idx = gfn_to_index(gfn, slot->base_gfn, level);
> + return &slot->arch.rmap[level - PG_LEVEL_4K][idx];
> +}
Optional: Since this is such a short function maybe just make it a
static inline in rmap.h?
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/7] KVM: x86/MMU: Move pte_list operations to rmap.c
2022-12-09 22:22 ` David Matlack
@ 2022-12-14 0:07 ` Ben Gardon
0 siblings, 0 replies; 21+ messages in thread
From: Ben Gardon @ 2022-12-14 0:07 UTC (permalink / raw)
To: David Matlack
Cc: linux-kernel, kvm, Paolo Bonzini, Peter Xu, Sean Christopherson,
Vipin Sharma
On Fri, Dec 9, 2022 at 2:22 PM David Matlack <dmatlack@google.com> wrote:
>
> On Tue, Dec 06, 2022 at 05:35:55PM +0000, Ben Gardon wrote:
> > In the interest of eventually splitting the Shadow MMU out of mmu.c,
> > start by moving some of the operations for manipulating pte_lists out of
> > mmu.c and into a new pair of files: rmap.c and rmap.h.
> >
> > No functional change intended.
> >
> > Signed-off-by: Ben Gardon <bgardon@google.com>
> > ---
> [...]
> > diff --git a/arch/x86/kvm/mmu/rmap.h b/arch/x86/kvm/mmu/rmap.h
> > new file mode 100644
> > index 000000000000..059765b6e066
> > --- /dev/null
> > +++ b/arch/x86/kvm/mmu/rmap.h
> > @@ -0,0 +1,34 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#ifndef __KVM_X86_MMU_RMAP_H
> > +#define __KVM_X86_MMU_RMAP_H
> > +
> > +#include <linux/kvm_host.h>
> > +
> > +/* make pte_list_desc fit well in cache lines */
> > +#define PTE_LIST_EXT 14
> > +
> > +/*
> > + * Slight optimization of cacheline layout, by putting `more' and `spte_count'
> > + * at the start; then accessing it will only use one single cacheline for
> > + * either full (entries==PTE_LIST_EXT) case or entries<=6.
> > + */
> > +struct pte_list_desc {
> > + struct pte_list_desc *more;
> > + /*
> > + * Stores number of entries stored in the pte_list_desc. No need to be
> > + * u64 but just for easier alignment. When PTE_LIST_EXT, means full.
> > + */
> > + u64 spte_count;
> > + u64 *sptes[PTE_LIST_EXT];
> > +};
> > +
> > +static struct kmem_cache *pte_list_desc_cache;
>
> The definition of pte_list_desc_cache needs to go in a C file since it's
> a global variable. Since it now needs to be accessed by more than once C
> file, drop the static. Then it can be accessed with extern.
>
> Since most of the code that sets up and deals with pte_list_desc_cache
> is still in mmu.c, my vote is to keep the definition there.
>
> i.e.
>
> mmu.c:
>
> struct kmem_cache *pte_list_desc_cache;
>
> rmap.c
>
> extern struct kmem_cache *pte_list_desc_cache;
>
> And no need for anything in rmap.h.
Right, good point. I'll fix that in the next edition.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/7] KVM: x86/MMU: Move pte_list operations to rmap.c
2022-12-07 22:58 ` Vipin Sharma
@ 2022-12-14 0:11 ` Ben Gardon
0 siblings, 0 replies; 21+ messages in thread
From: Ben Gardon @ 2022-12-14 0:11 UTC (permalink / raw)
To: Vipin Sharma
Cc: linux-kernel, kvm, Paolo Bonzini, Peter Xu, Sean Christopherson,
David Matlack
On Wed, Dec 7, 2022 at 2:58 PM Vipin Sharma <vipinsh@google.com> wrote:
>
> On Tue, Dec 6, 2022 at 9:36 AM Ben Gardon <bgardon@google.com> wrote:
> >
> > In the interest of eventually splitting the Shadow MMU out of mmu.c,
> > start by moving some of the operations for manipulating pte_lists out of
> > mmu.c and into a new pair of files: rmap.c and rmap.h.
> >
> > No functional change intended.
> >
> > Signed-off-by: Ben Gardon <bgardon@google.com>
> > ---
> > arch/x86/kvm/Makefile | 2 +-
> > arch/x86/kvm/debugfs.c | 1 +
> > arch/x86/kvm/mmu/mmu.c | 152 +-------------------------------
> > arch/x86/kvm/mmu/mmu_internal.h | 1 -
> > arch/x86/kvm/mmu/rmap.c | 141 +++++++++++++++++++++++++++++
> > arch/x86/kvm/mmu/rmap.h | 34 +++++++
> > 6 files changed, 179 insertions(+), 152 deletions(-)
> > create mode 100644 arch/x86/kvm/mmu/rmap.c
> > create mode 100644 arch/x86/kvm/mmu/rmap.h
> >
> > diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
> > index 80e3fe184d17..9f766eebeddf 100644
> > --- a/arch/x86/kvm/Makefile
> > +++ b/arch/x86/kvm/Makefile
> > @@ -12,7 +12,7 @@ include $(srctree)/virt/kvm/Makefile.kvm
> > kvm-y += x86.o emulate.o i8259.o irq.o lapic.o \
> > i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \
> > hyperv.o debugfs.o mmu/mmu.o mmu/page_track.o \
> > - mmu/spte.o
> > + mmu/spte.o mmu/rmap.o
> >
> > ifdef CONFIG_HYPERV
> > kvm-y += kvm_onhyperv.o
> > diff --git a/arch/x86/kvm/debugfs.c b/arch/x86/kvm/debugfs.c
> > index c1390357126a..29f692ecd6f3 100644
> > --- a/arch/x86/kvm/debugfs.c
> > +++ b/arch/x86/kvm/debugfs.c
> > @@ -9,6 +9,7 @@
> > #include "lapic.h"
> > #include "mmu.h"
> > #include "mmu/mmu_internal.h"
> > +#include "mmu/rmap.h"
> >
> > static int vcpu_get_timer_advance_ns(void *data, u64 *val)
> > {
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 4736d7849c60..90b3735d6064 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -26,6 +26,7 @@
> > #include "kvm_emulate.h"
> > #include "cpuid.h"
> > #include "spte.h"
> > +#include "rmap.h"
> >
> > #include <linux/kvm_host.h>
> > #include <linux/types.h>
> > @@ -112,24 +113,6 @@ module_param(dbg, bool, 0644);
> >
> > #include <trace/events/kvm.h>
> >
> > -/* make pte_list_desc fit well in cache lines */
> > -#define PTE_LIST_EXT 14
> > -
> > -/*
> > - * Slight optimization of cacheline layout, by putting `more' and `spte_count'
> > - * at the start; then accessing it will only use one single cacheline for
> > - * either full (entries==PTE_LIST_EXT) case or entries<=6.
> > - */
> > -struct pte_list_desc {
> > - struct pte_list_desc *more;
> > - /*
> > - * Stores number of entries stored in the pte_list_desc. No need to be
> > - * u64 but just for easier alignment. When PTE_LIST_EXT, means full.
> > - */
> > - u64 spte_count;
> > - u64 *sptes[PTE_LIST_EXT];
> > -};
> > -
> > struct kvm_shadow_walk_iterator {
> > u64 addr;
> > hpa_t shadow_addr;
> > @@ -155,7 +138,6 @@ struct kvm_shadow_walk_iterator {
> > ({ spte = mmu_spte_get_lockless(_walker.sptep); 1; }); \
> > __shadow_walk_next(&(_walker), spte))
> >
> > -static struct kmem_cache *pte_list_desc_cache;
> > struct kmem_cache *mmu_page_header_cache;
> > static struct percpu_counter kvm_total_used_mmu_pages;
> >
> > @@ -674,11 +656,6 @@ static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
> > kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
> > }
> >
> > -static void mmu_free_pte_list_desc(struct pte_list_desc *pte_list_desc)
> > -{
> > - kmem_cache_free(pte_list_desc_cache, pte_list_desc);
> > -}
> > -
> > static bool sp_has_gptes(struct kvm_mmu_page *sp);
> >
> > static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
> > @@ -878,111 +855,6 @@ gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t gfn,
> > return slot;
> > }
> >
> > -/*
> > - * About rmap_head encoding:
> > - *
> > - * If the bit zero of rmap_head->val is clear, then it points to the only spte
> > - * in this rmap chain. Otherwise, (rmap_head->val & ~1) points to a struct
> > - * pte_list_desc containing more mappings.
> > - */
> > -
> > -/*
> > - * Returns the number of pointers in the rmap chain, not counting the new one.
> > - */
> > -static int pte_list_add(struct kvm_mmu_memory_cache *cache, u64 *spte,
> > - struct kvm_rmap_head *rmap_head)
> > -{
> > - struct pte_list_desc *desc;
> > - int count = 0;
> > -
> > - if (!rmap_head->val) {
> > - rmap_printk("%p %llx 0->1\n", spte, *spte);
> > - rmap_head->val = (unsigned long)spte;
> > - } else if (!(rmap_head->val & 1)) {
> > - rmap_printk("%p %llx 1->many\n", spte, *spte);
> > - desc = kvm_mmu_memory_cache_alloc(cache);
> > - desc->sptes[0] = (u64 *)rmap_head->val;
> > - desc->sptes[1] = spte;
> > - desc->spte_count = 2;
> > - rmap_head->val = (unsigned long)desc | 1;
> > - ++count;
> > - } else {
> > - rmap_printk("%p %llx many->many\n", spte, *spte);
> > - desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
> > - while (desc->spte_count == PTE_LIST_EXT) {
> > - count += PTE_LIST_EXT;
> > - if (!desc->more) {
> > - desc->more = kvm_mmu_memory_cache_alloc(cache);
> > - desc = desc->more;
> > - desc->spte_count = 0;
> > - break;
> > - }
> > - desc = desc->more;
> > - }
> > - count += desc->spte_count;
> > - desc->sptes[desc->spte_count++] = spte;
> > - }
> > - return count;
> > -}
> > -
> > -static void
> > -pte_list_desc_remove_entry(struct kvm_rmap_head *rmap_head,
> > - struct pte_list_desc *desc, int i,
> > - struct pte_list_desc *prev_desc)
> > -{
> > - int j = desc->spte_count - 1;
> > -
> > - desc->sptes[i] = desc->sptes[j];
> > - desc->sptes[j] = NULL;
> > - desc->spte_count--;
> > - if (desc->spte_count)
> > - return;
> > - if (!prev_desc && !desc->more)
> > - rmap_head->val = 0;
> > - else
> > - if (prev_desc)
> > - prev_desc->more = desc->more;
> > - else
> > - rmap_head->val = (unsigned long)desc->more | 1;
> > - mmu_free_pte_list_desc(desc);
> > -}
> > -
> > -static void pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head)
> > -{
> > - struct pte_list_desc *desc;
> > - struct pte_list_desc *prev_desc;
> > - int i;
> > -
> > - if (!rmap_head->val) {
> > - pr_err("%s: %p 0->BUG\n", __func__, spte);
> > - BUG();
> > - } else if (!(rmap_head->val & 1)) {
> > - rmap_printk("%p 1->0\n", spte);
> > - if ((u64 *)rmap_head->val != spte) {
> > - pr_err("%s: %p 1->BUG\n", __func__, spte);
> > - BUG();
> > - }
> > - rmap_head->val = 0;
> > - } else {
> > - rmap_printk("%p many->many\n", spte);
> > - desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
> > - prev_desc = NULL;
> > - while (desc) {
> > - for (i = 0; i < desc->spte_count; ++i) {
> > - if (desc->sptes[i] == spte) {
> > - pte_list_desc_remove_entry(rmap_head,
> > - desc, i, prev_desc);
> > - return;
> > - }
> > - }
> > - prev_desc = desc;
> > - desc = desc->more;
> > - }
> > - pr_err("%s: %p many->many\n", __func__, spte);
> > - BUG();
> > - }
> > -}
> > -
> > static void kvm_zap_one_rmap_spte(struct kvm *kvm,
> > struct kvm_rmap_head *rmap_head, u64 *sptep)
> > {
> > @@ -1011,7 +883,7 @@ static bool kvm_zap_all_rmap_sptes(struct kvm *kvm,
> > for (i = 0; i < desc->spte_count; i++)
> > mmu_spte_clear_track_bits(kvm, desc->sptes[i]);
> > next = desc->more;
> > - mmu_free_pte_list_desc(desc);
> > + free_pte_list_desc(desc);
> > }
> > out:
> > /* rmap_head is meaningless now, remember to reset it */
> > @@ -1019,26 +891,6 @@ static bool kvm_zap_all_rmap_sptes(struct kvm *kvm,
> > return true;
> > }
> >
> > -unsigned int pte_list_count(struct kvm_rmap_head *rmap_head)
> > -{
> > - struct pte_list_desc *desc;
> > - unsigned int count = 0;
> > -
> > - if (!rmap_head->val)
> > - return 0;
> > - else if (!(rmap_head->val & 1))
> > - return 1;
> > -
> > - desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
> > -
> > - while (desc) {
> > - count += desc->spte_count;
> > - desc = desc->more;
> > - }
> > -
> > - return count;
> > -}
> > -
> > static struct kvm_rmap_head *gfn_to_rmap(gfn_t gfn, int level,
> > const struct kvm_memory_slot *slot)
> > {
> > diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> > index dbaf6755c5a7..cd1c8f32269d 100644
> > --- a/arch/x86/kvm/mmu/mmu_internal.h
> > +++ b/arch/x86/kvm/mmu/mmu_internal.h
> > @@ -166,7 +166,6 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
> > int min_level);
> > void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
> > u64 start_gfn, u64 pages);
> > -unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
> >
> > extern int nx_huge_pages;
> > static inline bool is_nx_huge_page_enabled(struct kvm *kvm)
> > diff --git a/arch/x86/kvm/mmu/rmap.c b/arch/x86/kvm/mmu/rmap.c
> > new file mode 100644
> > index 000000000000..daa99dee0709
> > --- /dev/null
> > +++ b/arch/x86/kvm/mmu/rmap.c
> > @@ -0,0 +1,141 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
>
> A comment would be nice to write expectations from this file and what
> code lives here.
I'll add one.
>
> > +#include "mmu.h"
> > +#include "mmu_internal.h"
> > +#include "mmutrace.h"
> > +#include "rmap.h"
> > +#include "spte.h"
> > +
> > +#include <asm/cmpxchg.h>
> > +#include <trace/events/kvm.h>
> > +
> > +/*
> > + * About rmap_head encoding:
> > + *
> > + * If the bit zero of rmap_head->val is clear, then it points to the only spte
> > + * in this rmap chain. Otherwise, (rmap_head->val & ~1) points to a struct
> > + * pte_list_desc containing more mappings.
> > + */
> > +
> > +/*
> > + * Returns the number of pointers in the rmap chain, not counting the new one.
> > + */
> > +int pte_list_add(struct kvm_mmu_memory_cache *cache, u64 *spte,
> > + struct kvm_rmap_head *rmap_head)
> > +{
> > + struct pte_list_desc *desc;
> > + int count = 0;
> > +
> > + if (!rmap_head->val) {
> > + rmap_printk("%p %llx 0->1\n", spte, *spte);
> > + rmap_head->val = (unsigned long)spte;
> > + } else if (!(rmap_head->val & 1)) {
> > + rmap_printk("%p %llx 1->many\n", spte, *spte);
> > + desc = kvm_mmu_memory_cache_alloc(cache);
> > + desc->sptes[0] = (u64 *)rmap_head->val;
> > + desc->sptes[1] = spte;
> > + desc->spte_count = 2;
> > + rmap_head->val = (unsigned long)desc | 1;
> > + ++count;
> > + } else {
> > + rmap_printk("%p %llx many->many\n", spte, *spte);
> > + desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
> > + while (desc->spte_count == PTE_LIST_EXT) {
> > + count += PTE_LIST_EXT;
> > + if (!desc->more) {
> > + desc->more = kvm_mmu_memory_cache_alloc(cache);
> > + desc = desc->more;
> > + desc->spte_count = 0;
> > + break;
> > + }
> > + desc = desc->more;
> > + }
> > + count += desc->spte_count;
> > + desc->sptes[desc->spte_count++] = spte;
> > + }
> > + return count;
> > +}
> > +
> > +void free_pte_list_desc(struct pte_list_desc *pte_list_desc)
> > +{
> > + kmem_cache_free(pte_list_desc_cache, pte_list_desc);
> > +}
> > +
> > +static void
> > +pte_list_desc_remove_entry(struct kvm_rmap_head *rmap_head,
> > + struct pte_list_desc *desc, int i,
> > + struct pte_list_desc *prev_desc)
> > +{
> > + int j = desc->spte_count - 1;
> > +
> > + desc->sptes[i] = desc->sptes[j];
> > + desc->sptes[j] = NULL;
> > + desc->spte_count--;
> > + if (desc->spte_count)
> > + return;
> > + if (!prev_desc && !desc->more)
> > + rmap_head->val = 0;
> > + else
> > + if (prev_desc)
> > + prev_desc->more = desc->more;
> > + else
> > + rmap_head->val = (unsigned long)desc->more | 1;
> > + free_pte_list_desc(desc);
> > +}
> > +
> > +void pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head)
> > +{
> > + struct pte_list_desc *desc;
> > + struct pte_list_desc *prev_desc;
> > + int i;
> > +
> > + if (!rmap_head->val) {
> > + pr_err("%s: %p 0->BUG\n", __func__, spte);
> > + BUG();
> > + } else if (!(rmap_head->val & 1)) {
> > + rmap_printk("%p 1->0\n", spte);
> > + if ((u64 *)rmap_head->val != spte) {
> > + pr_err("%s: %p 1->BUG\n", __func__, spte);
> > + BUG();
> > + }
> > + rmap_head->val = 0;
> > + } else {
> > + rmap_printk("%p many->many\n", spte);
> > + desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
> > + prev_desc = NULL;
> > + while (desc) {
> > + for (i = 0; i < desc->spte_count; ++i) {
> > + if (desc->sptes[i] == spte) {
> > + pte_list_desc_remove_entry(rmap_head,
> > + desc, i, prev_desc);
> > + return;
> > + }
> > + }
> > + prev_desc = desc;
> > + desc = desc->more;
> > + }
> > + pr_err("%s: %p many->many\n", __func__, spte);
> > + BUG();
> > + }
> > +}
> > +
> > +unsigned int pte_list_count(struct kvm_rmap_head *rmap_head)
> > +{
> > + struct pte_list_desc *desc;
> > + unsigned int count = 0;
> > +
> > + if (!rmap_head->val)
> > + return 0;
> > + else if (!(rmap_head->val & 1))
> > + return 1;
> > +
> > + desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
> > +
> > + while (desc) {
> > + count += desc->spte_count;
> > + desc = desc->more;
> > + }
> > +
> > + return count;
> > +}
> > +
> > diff --git a/arch/x86/kvm/mmu/rmap.h b/arch/x86/kvm/mmu/rmap.h
> > new file mode 100644
> > index 000000000000..059765b6e066
> > --- /dev/null
> > +++ b/arch/x86/kvm/mmu/rmap.h
> > @@ -0,0 +1,34 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#ifndef __KVM_X86_MMU_RMAP_H
> > +#define __KVM_X86_MMU_RMAP_H
> > +
> > +#include <linux/kvm_host.h>
> > +
> > +/* make pte_list_desc fit well in cache lines */
> > +#define PTE_LIST_EXT 14
> > +
> > +/*
> > + * Slight optimization of cacheline layout, by putting `more' and `spte_count'
> > + * at the start; then accessing it will only use one single cacheline for
> > + * either full (entries==PTE_LIST_EXT) case or entries<=6.
> > + */
> > +struct pte_list_desc {
> > + struct pte_list_desc *more;
> > + /*
> > + * Stores number of entries stored in the pte_list_desc. No need to be
> > + * u64 but just for easier alignment. When PTE_LIST_EXT, means full.
> > + */
> > + u64 spte_count;
> > + u64 *sptes[PTE_LIST_EXT];
> > +};
> > +
> > +static struct kmem_cache *pte_list_desc_cache;
>
> Does it make sense to make it non static and extern here. Also, you
> can provide an init function which can be called from mmu.c?
Going to follow David's suggestion and leave it in mmu.c for now.
>
>
> > +
> > +int pte_list_add(struct kvm_mmu_memory_cache *cache, u64 *spte,
> > + struct kvm_rmap_head *rmap_head);
> > +void free_pte_list_desc(struct pte_list_desc *pte_list_desc);
> > +void pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head);
> > +unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
> > +
>
> Similar to tdp_mmu, and other rmap functions in next patches in the
> series should above functions be prefixed with "rmap_"?
I think I'm going to abandon the idea of having a seperate file for
rmap stuff and just have one, larger shadow mmu file with a variety of
names. I'll clean up the naming at the end of the series once
everything is moved over and the set of things being exported from the
shadow_mmu.c file has stabilized.
>
>
> > +#endif /* __KVM_X86_MMU_RMAP_H */
> > --
> > 2.39.0.rc0.267.gcb52ba06e7-goog
> >
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/7] KVM: x86/MMU: Move rmap_iterator to rmap.h
2022-12-09 23:04 ` David Matlack
@ 2022-12-14 0:12 ` Ben Gardon
2022-12-14 0:59 ` Sean Christopherson
0 siblings, 1 reply; 21+ messages in thread
From: Ben Gardon @ 2022-12-14 0:12 UTC (permalink / raw)
To: David Matlack
Cc: linux-kernel, kvm, Paolo Bonzini, Peter Xu, Sean Christopherson,
Vipin Sharma
On Fri, Dec 9, 2022 at 3:04 PM David Matlack <dmatlack@google.com> wrote:
>
> On Tue, Dec 06, 2022 at 05:35:56PM +0000, Ben Gardon wrote:
> > In continuing to factor the rmap out of mmu.c, move the rmap_iterator
> > and associated functions and macros into rmap.(c|h).
> >
> > No functional change intended.
> >
> > Signed-off-by: Ben Gardon <bgardon@google.com>
> > ---
> > arch/x86/kvm/mmu/mmu.c | 76 -----------------------------------------
> > arch/x86/kvm/mmu/rmap.c | 61 +++++++++++++++++++++++++++++++++
> > arch/x86/kvm/mmu/rmap.h | 18 ++++++++++
> > 3 files changed, 79 insertions(+), 76 deletions(-)
> >
> [...]
> > diff --git a/arch/x86/kvm/mmu/rmap.h b/arch/x86/kvm/mmu/rmap.h
> > index 059765b6e066..13b265f3a95e 100644
> > --- a/arch/x86/kvm/mmu/rmap.h
> > +++ b/arch/x86/kvm/mmu/rmap.h
> > @@ -31,4 +31,22 @@ void free_pte_list_desc(struct pte_list_desc *pte_list_desc);
> > void pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head);
> > unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
> >
> > +/*
> > + * Used by the following functions to iterate through the sptes linked by a
> > + * rmap. All fields are private and not assumed to be used outside.
> > + */
> > +struct rmap_iterator {
> > + /* private fields */
> > + struct pte_list_desc *desc; /* holds the sptep if not NULL */
> > + int pos; /* index of the sptep */
> > +};
> > +
> > +u64 *rmap_get_first(struct kvm_rmap_head *rmap_head,
> > + struct rmap_iterator *iter);
> > +u64 *rmap_get_next(struct rmap_iterator *iter);
> > +
> > +#define for_each_rmap_spte(_rmap_head_, _iter_, _spte_) \
> > + for (_spte_ = rmap_get_first(_rmap_head_, _iter_); \
> > + _spte_; _spte_ = rmap_get_next(_iter_))
> > +
>
> I always found these function names and kvm_rmap_head confusing since
> they are about iterating through the pte_list_desc data structure. The
> rmap (gfn -> list of sptes) is a specific application of the
> pte_list_desc structure, but not the only application. There's also
> parent_ptes in struct kvm_mmu_page, which is not an rmap, just a plain
> old list of ptes.
>
> While you are refactoring this code, what do you think about doing the
> following renames?
>
> struct kvm_rmap_head -> struct pte_list_head
> struct rmap_iterator -> struct pte_list_iterator
> rmap_get_first() -> pte_list_get_first()
> rmap_get_next() -> pte_list_get_next()
> for_each_rmap_spte() -> for_each_pte_list_entry()
>
> Then we can reserve the term "rmap" just for the actual rmap
> (slot->arch.rmap), and code that deals with sp->parent_ptes will become
> a lot more clear IMO (because it will not longer mention rmap).
>
> e.g. We go from this:
>
> struct rmap_iterator iter;
> u64 *sptep;
>
> for_each_rmap_spte(&sp->parent_ptes, &iter, sptep) {
> ...
> }
>
> To this:
>
> struct pte_list_iterator iter;
> u64 *sptep;
>
> for_each_pte_list_entry(&sp->parent_ptes, &iter, sptep) {
> ...
> }
I like this suggestion, and I do think it'll make things more
readable. It's going to be a huge patch to rename all the instances of
kvm_rmap_head, but it's probably worth it.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/7] KVM: x86/MMU: Move rmap_iterator to rmap.h
2022-12-14 0:12 ` Ben Gardon
@ 2022-12-14 0:59 ` Sean Christopherson
2022-12-14 17:53 ` Ben Gardon
0 siblings, 1 reply; 21+ messages in thread
From: Sean Christopherson @ 2022-12-14 0:59 UTC (permalink / raw)
To: Ben Gardon
Cc: David Matlack, linux-kernel, kvm, Paolo Bonzini, Peter Xu,
Vipin Sharma
On Tue, Dec 13, 2022, Ben Gardon wrote:
> On Fri, Dec 9, 2022 at 3:04 PM David Matlack <dmatlack@google.com> wrote:
> >
> > > +/*
> > > + * Used by the following functions to iterate through the sptes linked by a
> > > + * rmap. All fields are private and not assumed to be used outside.
> > > + */
> > > +struct rmap_iterator {
> > > + /* private fields */
> > > + struct pte_list_desc *desc; /* holds the sptep if not NULL */
> > > + int pos; /* index of the sptep */
> > > +};
> > > +
> > > +u64 *rmap_get_first(struct kvm_rmap_head *rmap_head,
> > > + struct rmap_iterator *iter);
> > > +u64 *rmap_get_next(struct rmap_iterator *iter);
> > > +
> > > +#define for_each_rmap_spte(_rmap_head_, _iter_, _spte_) \
> > > + for (_spte_ = rmap_get_first(_rmap_head_, _iter_); \
> > > + _spte_; _spte_ = rmap_get_next(_iter_))
> > > +
> >
> > I always found these function names and kvm_rmap_head confusing since
Heh, you definitely aren't the only one.
> > they are about iterating through the pte_list_desc data structure. The
> > rmap (gfn -> list of sptes) is a specific application of the
> > pte_list_desc structure, but not the only application. There's also
> > parent_ptes in struct kvm_mmu_page, which is not an rmap, just a plain
> > old list of ptes.
>
> > While you are refactoring this code, what do you think about doing the
> > following renames?
> >
> > struct kvm_rmap_head -> struct pte_list_head
> > struct rmap_iterator -> struct pte_list_iterator
> > rmap_get_first() -> pte_list_get_first()
> > rmap_get_next() -> pte_list_get_next()
> > for_each_rmap_spte() -> for_each_pte_list_entry()
I would strongly prefer to keep "spte" in this one regardless of what other naming
changes we do (see below). Maybe just for_each_spte()? IMO, "pte_list_entry"
unnecessarily obfuscates that it's a list of SPTEs.
> > Then we can reserve the term "rmap" just for the actual rmap
> > (slot->arch.rmap), and code that deals with sp->parent_ptes will become
> > a lot more clear IMO (because it will not longer mention rmap).
> >
> > e.g. We go from this:
> >
> > struct rmap_iterator iter;
> > u64 *sptep;
> >
> > for_each_rmap_spte(&sp->parent_ptes, &iter, sptep) {
> > ...
> > }
> >
> > To this:
> >
> > struct pte_list_iterator iter;
> > u64 *sptep;
> >
> > for_each_pte_list_entry(&sp->parent_ptes, &iter, sptep) {
> > ...
> > }
>
> I like this suggestion, and I do think it'll make things more
> readable. It's going to be a huge patch to rename all the instances of
> kvm_rmap_head, but it's probably worth it.
I generally like this idea too, but tying into my above comment, before jumping
in I think we should figure out what end state we want, i.e. get the bikeshedding
out of the way now to hopefully avoid dragging out a series while various things
get nitpicked.
E.g. if we if we just rename the structs and their macros, then we'll end up with
things like
static bool slot_rmap_write_protect(struct kvm *kvm,
struct pte_list_head *rmap_head,
const struct kvm_memory_slot *slot)
{
return rmap_write_protect(rmap_head, false);
}
which isn't terrible, but there's still opportunity for cleanup, e.g.
rmap_write_protect() could easily be sptes_write_protect() or write_protect_sptes().
That will generate a naming conflict of sorts with pte_list_head if we don't also
rename that to spte_list_head. And I think capturing that it's a list of SPTEs and
not guest PTEs will be helpful in general.
And if we rename pte_list_head, then we might as well commit 100% and use consisnent
nomenclature across the board, e.g. end up with
static bool sptes_clear_dirty(struct kvm *kvm, struct sptes_list_head *head,
const struct kvm_memory_slot *slot)
{
u64 *sptep;
struct spte_list_iterator iter;
bool flush = false;
for_each_spte(head, &iter, sptep) {
if (spte_ad_need_write_protect(*sptep))
flush |= spte_wrprot_for_clear_dirty(sptep);
else
flush |= spte_clear_dirty(sptep);
}
return flush;
}
versus the current
static bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
const struct kvm_memory_slot *slot)
{
u64 *sptep;
struct rmap_iterator iter;
bool flush = false;
for_each_rmap_spte(rmap_head, &iter, sptep)
if (spte_ad_need_write_protect(*sptep))
flush |= spte_wrprot_for_clear_dirty(sptep);
else
flush |= spte_clear_dirty(sptep);
return flush;
}
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/7] KVM: x86/MMU: Move rmap_iterator to rmap.h
2022-12-14 0:59 ` Sean Christopherson
@ 2022-12-14 17:53 ` Ben Gardon
2022-12-15 0:34 ` Sean Christopherson
0 siblings, 1 reply; 21+ messages in thread
From: Ben Gardon @ 2022-12-14 17:53 UTC (permalink / raw)
To: Sean Christopherson
Cc: David Matlack, linux-kernel, kvm, Paolo Bonzini, Peter Xu,
Vipin Sharma
On Tue, Dec 13, 2022 at 4:59 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, Dec 13, 2022, Ben Gardon wrote:
> > On Fri, Dec 9, 2022 at 3:04 PM David Matlack <dmatlack@google.com> wrote:
> > >
> > > > +/*
> > > > + * Used by the following functions to iterate through the sptes linked by a
> > > > + * rmap. All fields are private and not assumed to be used outside.
> > > > + */
> > > > +struct rmap_iterator {
> > > > + /* private fields */
> > > > + struct pte_list_desc *desc; /* holds the sptep if not NULL */
> > > > + int pos; /* index of the sptep */
> > > > +};
> > > > +
> > > > +u64 *rmap_get_first(struct kvm_rmap_head *rmap_head,
> > > > + struct rmap_iterator *iter);
> > > > +u64 *rmap_get_next(struct rmap_iterator *iter);
> > > > +
> > > > +#define for_each_rmap_spte(_rmap_head_, _iter_, _spte_) \
> > > > + for (_spte_ = rmap_get_first(_rmap_head_, _iter_); \
> > > > + _spte_; _spte_ = rmap_get_next(_iter_))
> > > > +
> > >
> > > I always found these function names and kvm_rmap_head confusing since
>
> Heh, you definitely aren't the only one.
>
> > > they are about iterating through the pte_list_desc data structure. The
> > > rmap (gfn -> list of sptes) is a specific application of the
> > > pte_list_desc structure, but not the only application. There's also
> > > parent_ptes in struct kvm_mmu_page, which is not an rmap, just a plain
> > > old list of ptes.
> >
> > > While you are refactoring this code, what do you think about doing the
> > > following renames?
> > >
> > > struct kvm_rmap_head -> struct pte_list_head
> > > struct rmap_iterator -> struct pte_list_iterator
> > > rmap_get_first() -> pte_list_get_first()
> > > rmap_get_next() -> pte_list_get_next()
> > > for_each_rmap_spte() -> for_each_pte_list_entry()
>
> I would strongly prefer to keep "spte" in this one regardless of what other naming
> changes we do (see below). Maybe just for_each_spte()? IMO, "pte_list_entry"
> unnecessarily obfuscates that it's a list of SPTEs.
>
> > > Then we can reserve the term "rmap" just for the actual rmap
> > > (slot->arch.rmap), and code that deals with sp->parent_ptes will become
> > > a lot more clear IMO (because it will not longer mention rmap).
> > >
> > > e.g. We go from this:
> > >
> > > struct rmap_iterator iter;
> > > u64 *sptep;
> > >
> > > for_each_rmap_spte(&sp->parent_ptes, &iter, sptep) {
> > > ...
> > > }
> > >
> > > To this:
> > >
> > > struct pte_list_iterator iter;
> > > u64 *sptep;
> > >
> > > for_each_pte_list_entry(&sp->parent_ptes, &iter, sptep) {
> > > ...
> > > }
> >
> > I like this suggestion, and I do think it'll make things more
> > readable. It's going to be a huge patch to rename all the instances of
> > kvm_rmap_head, but it's probably worth it.
>
> I generally like this idea too, but tying into my above comment, before jumping
> in I think we should figure out what end state we want, i.e. get the bikeshedding
> out of the way now to hopefully avoid dragging out a series while various things
> get nitpicked.
>
> E.g. if we if we just rename the structs and their macros, then we'll end up with
> things like
>
> static bool slot_rmap_write_protect(struct kvm *kvm,
> struct pte_list_head *rmap_head,
> const struct kvm_memory_slot *slot)
> {
> return rmap_write_protect(rmap_head, false);
> }
>
> which isn't terrible, but there's still opportunity for cleanup, e.g.
> rmap_write_protect() could easily be sptes_write_protect() or write_protect_sptes().
>
> That will generate a naming conflict of sorts with pte_list_head if we don't also
> rename that to spte_list_head. And I think capturing that it's a list of SPTEs and
> not guest PTEs will be helpful in general.
>
> And if we rename pte_list_head, then we might as well commit 100% and use consisnent
> nomenclature across the board, e.g. end up with
>
> static bool sptes_clear_dirty(struct kvm *kvm, struct sptes_list_head *head,
> const struct kvm_memory_slot *slot)
> {
> u64 *sptep;
> struct spte_list_iterator iter;
> bool flush = false;
>
> for_each_spte(head, &iter, sptep) {
> if (spte_ad_need_write_protect(*sptep))
> flush |= spte_wrprot_for_clear_dirty(sptep);
> else
> flush |= spte_clear_dirty(sptep);
> }
>
> return flush;
> }
>
> versus the current
>
> static bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
> const struct kvm_memory_slot *slot)
> {
> u64 *sptep;
> struct rmap_iterator iter;
> bool flush = false;
>
> for_each_rmap_spte(rmap_head, &iter, sptep)
> if (spte_ad_need_write_protect(*sptep))
> flush |= spte_wrprot_for_clear_dirty(sptep);
> else
> flush |= spte_clear_dirty(sptep);
>
> return flush;
> }
I'd be happy to see some consistent SPTE-based naming in the Shadow
MMU and more or less get rid of the rmap naming scheme. Once you
change to spte_list_head or whatever, the use of the actual rmap (an
array of spte_list_heads) becomes super narrow.
Given the potential for enormous scope creep on what's already going
to be a long series, I'm inclined to split this work into two parts:
1. Move code from mmu.c to shadow_mmu.c with minimal cleanups /
refactors / renames; just move the code
2. Clean up naming conventions: make the functions exported in
shadow_mmu.h consistent, get rid of the whole rmap naming scheme, etc.
That way git-blame will preserve context around the renames /
refactors which would be obfuscated if we did 2 before 1, and we can
reduce merge conflicts.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/7] KVM: x86/MMU: Move rmap_iterator to rmap.h
2022-12-14 17:53 ` Ben Gardon
@ 2022-12-15 0:34 ` Sean Christopherson
0 siblings, 0 replies; 21+ messages in thread
From: Sean Christopherson @ 2022-12-15 0:34 UTC (permalink / raw)
To: Ben Gardon
Cc: David Matlack, linux-kernel, kvm, Paolo Bonzini, Peter Xu,
Vipin Sharma
On Wed, Dec 14, 2022, Ben Gardon wrote:
> On Tue, Dec 13, 2022 at 4:59 PM Sean Christopherson <seanjc@google.com> wrote:
> > And if we rename pte_list_head, then we might as well commit 100% and use consisnent
> > nomenclature across the board, e.g. end up with
...
> I'd be happy to see some consistent SPTE-based naming in the Shadow
> MMU and more or less get rid of the rmap naming scheme. Once you
> change to spte_list_head or whatever, the use of the actual rmap (an
> array of spte_list_heads) becomes super narrow.
Yeah. And at least for me, the more literal "walk a list of SPTEs" is much
easier for me to wrap my head around than "walk rmaps".
> Given the potential for enormous scope creep on what's already going
> to be a long series, I'm inclined to split this work into two parts:
> 1. Move code from mmu.c to shadow_mmu.c with minimal cleanups /
> refactors / renames; just move the code
> 2. Clean up naming conventions: make the functions exported in
> shadow_mmu.h consistent, get rid of the whole rmap naming scheme, etc.
>
> That way git-blame will preserve context around the renames /
> refactors which would be obfuscated if we did 2 before 1,
+1
> and we can reduce merge conflicts.
That might be wishful thinking ;-)
One thought for the rename would be to gather all the reviews and feedback, and
then wait to send the final version until shortly before the merge window, i.e.
wait for everything else to land so that only future development gets affected.
That would give Paolo and I a bit of extra motiviation to get the x86 queue
solidified sooner than later too...
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2022-12-15 0:34 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-06 17:35 [PATCH 0/7] KVM: x86/MMU: Factor rmap operations out of mmu.c Ben Gardon
2022-12-06 17:35 ` [PATCH 1/7] KVM: x86/MMU: Move pte_list operations to rmap.c Ben Gardon
2022-12-07 22:58 ` Vipin Sharma
2022-12-14 0:11 ` Ben Gardon
2022-12-09 22:22 ` David Matlack
2022-12-14 0:07 ` Ben Gardon
2022-12-06 17:35 ` [PATCH 2/7] KVM: x86/MMU: Move rmap_iterator to rmap.h Ben Gardon
2022-12-09 23:04 ` David Matlack
2022-12-14 0:12 ` Ben Gardon
2022-12-14 0:59 ` Sean Christopherson
2022-12-14 17:53 ` Ben Gardon
2022-12-15 0:34 ` Sean Christopherson
2022-12-06 17:35 ` [PATCH 3/7] KVM: x86/MMU: Move gfn_to_rmap() to rmap.c Ben Gardon
2022-12-09 23:32 ` David Matlack
2022-12-06 17:35 ` [PATCH 4/7] KVM: x86/MMU: Move rmap_can_add() and rmap_remove() " Ben Gardon
2022-12-06 17:35 ` [PATCH 5/7] KVM: x86/MMU: Move the rmap walk iterator out of mmu.c Ben Gardon
2022-12-06 17:36 ` [PATCH 6/7] KVM: x86/MMU: Move rmap zap operations to rmap.c Ben Gardon
2022-12-09 22:44 ` David Matlack
2022-12-06 17:36 ` [PATCH 7/7] KVM: x86/MMU: Move rmap_add() " Ben Gardon
2022-12-09 23:27 ` David Matlack
2022-12-09 23:14 ` [PATCH 0/7] KVM: x86/MMU: Factor rmap operations out of mmu.c David Matlack
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox