[PATCH] KVM: x86: Fast forward the iterator when zapping the TDP MMU

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] KVM: x86: Fast forward the iterator when zapping the TDP MMU
@ 2024-10-23  9:18 Bernhard Kauer
  2024-10-23 23:27 ` Sean Christopherson
  0 siblings, 1 reply; 4+ messages in thread
From: Bernhard Kauer @ 2024-10-23  9:18 UTC (permalink / raw)
  To: kvm; +Cc: Bernhard Kauer

Zapping a root means scanning for present entries in a page-table
hierarchy. This process is relatively slow since it needs to be
preemtible as millions of entries might be processed.

Furthermore the root-page is traversed multiple times as zapping
is done with increasing page-sizes.

Optimizing for the not-present case speeds up the hello microbenchmark
by 115 microseconds.

Signed-off-by: Bernhard Kauer <bk@alpico.io>
---
 arch/x86/kvm/mmu/tdp_iter.h | 21 +++++++++++++++++++++
 arch/x86/kvm/mmu/tdp_mmu.c  |  2 +-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
index 2880fd392e0c..7ad28ac2c6b8 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/arch/x86/kvm/mmu/tdp_iter.h
@@ -130,6 +130,27 @@ struct tdp_iter {
 #define for_each_tdp_pte(iter, root, start, end) \
 	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_4K, start, end)
 
+
+/*
+ * Skip up to count not present entries of the iterator. Returns true
+ * if the final entry is not present.
+ */
+static inline bool tdp_iter_skip_not_present(struct tdp_iter *iter, int count)
+{
+	int i;
+	int pos;
+
+	pos = SPTE_INDEX(iter->gfn << PAGE_SHIFT, iter->level);
+	count = min(count, SPTE_ENT_PER_PAGE - 1 - pos);
+	for (i = 0; i < count && !is_shadow_present_pte(iter->old_spte); i++)
+		iter->old_spte = kvm_tdp_mmu_read_spte(iter->sptep + i + 1);
+
+	iter->gfn += i * KVM_PAGES_PER_HPAGE(iter->level);
+	iter->next_last_level_gfn = iter->gfn;
+	iter->sptep += i;
+	return !is_shadow_present_pte(iter->old_spte);
+}
+
 tdp_ptep_t spte_to_child_pt(u64 pte, int level);
 
 void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root,
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 1951f76db657..404726511f95 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -750,7 +750,7 @@ static void __tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root,
 		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, shared))
 			continue;
 
-		if (!is_shadow_present_pte(iter.old_spte))
+		if (tdp_iter_skip_not_present(&iter, 32))
 			continue;
 
 		if (iter.level > zap_level)
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] KVM: x86: Fast forward the iterator when zapping the TDP MMU
  2024-10-23  9:18 [PATCH] KVM: x86: Fast forward the iterator when zapping the TDP MMU Bernhard Kauer
@ 2024-10-23 23:27 ` Sean Christopherson
  2024-10-24 10:25   ` Bernhard Kauer
  0 siblings, 1 reply; 4+ messages in thread
From: Sean Christopherson @ 2024-10-23 23:27 UTC (permalink / raw)
  To: Bernhard Kauer; +Cc: kvm

On Wed, Oct 23, 2024, Bernhard Kauer wrote:
> Zapping a root means scanning for present entries in a page-table
> hierarchy. This process is relatively slow since it needs to be
> preemtible as millions of entries might be processed.
> 
> Furthermore the root-page is traversed multiple times as zapping
> is done with increasing page-sizes.
> 
> Optimizing for the not-present case speeds up the hello microbenchmark
> by 115 microseconds.

What is the "hello" microbenchmark?  Do we actually care if it's faster?

Are you able to determine exactly what makes iteration slow?  Partly because I'm
curious, partly because it would be helpful to be able to avoid problematic
patterns in the future, and partly because maybe there's a more elegant solution.

Regardless of why iteration is slow, I would much prefer to solve this for all
users of the iterator.  E.g. very lightly tested, and not 100% optimized (though
should be on par with the below).

---
 arch/x86/kvm/mmu/tdp_iter.c | 33 ++++++++++++++++-----------------
 arch/x86/kvm/mmu/tdp_iter.h | 19 +++++++++++++------
 arch/x86/kvm/mmu/tdp_mmu.c  | 26 ++++++++++----------------
 3 files changed, 39 insertions(+), 39 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/arch/x86/kvm/mmu/tdp_iter.c
index 04c247bfe318..53aebc044ca6 100644
--- a/arch/x86/kvm/mmu/tdp_iter.c
+++ b/arch/x86/kvm/mmu/tdp_iter.c
@@ -37,7 +37,7 @@ void tdp_iter_restart(struct tdp_iter *iter)
  * rooted at root_pt, starting with the walk to translate next_last_level_gfn.
  */
 void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root,
-		    int min_level, gfn_t next_last_level_gfn)
+		    int min_level, bool only_present, gfn_t next_last_level_gfn)
 {
 	if (WARN_ON_ONCE(!root || (root->role.level < 1) ||
 			 (root->role.level > PT64_ROOT_MAX_LEVEL))) {
@@ -45,6 +45,7 @@ void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root,
 		return;
 	}
 
+	iter->only_shadow_present = only_present;
 	iter->next_last_level_gfn = next_last_level_gfn;
 	iter->root_level = root->role.level;
 	iter->min_level = min_level;
@@ -103,26 +104,24 @@ static bool try_step_down(struct tdp_iter *iter)
 /*
  * Steps to the next entry in the current page table, at the current page table
  * level. The next entry could point to a page backing guest memory or another
- * page table, or it could be non-present. Returns true if the iterator was
- * able to step to the next entry in the page table, false if the iterator was
- * already at the end of the current page table.
+ * page table, or it could be non-present. Skips non-present entries if the
+ * iterator is configured to process only shadow-present entries. Returns true
+ * if the iterator was able to step to the next entry in the page table, false
+ * if the iterator was already at the end of the current page table.
  */
 static bool try_step_side(struct tdp_iter *iter)
 {
-	/*
-	 * Check if the iterator is already at the end of the current page
-	 * table.
-	 */
-	if (SPTE_INDEX(iter->gfn << PAGE_SHIFT, iter->level) ==
-	    (SPTE_ENT_PER_PAGE - 1))
-		return false;
+	while (SPTE_INDEX(iter->gfn << PAGE_SHIFT, iter->level) < (SPTE_ENT_PER_PAGE - 1)) {
+		iter->gfn += KVM_PAGES_PER_HPAGE(iter->level);
+		iter->next_last_level_gfn = iter->gfn;
+		iter->sptep++;
+		iter->old_spte = kvm_tdp_mmu_read_spte(iter->sptep);
 
-	iter->gfn += KVM_PAGES_PER_HPAGE(iter->level);
-	iter->next_last_level_gfn = iter->gfn;
-	iter->sptep++;
-	iter->old_spte = kvm_tdp_mmu_read_spte(iter->sptep);
-
-	return true;
+		if (!iter->only_shadow_present ||
+		    is_shadow_present_pte(iter->old_spte))
+			return true;
+	}
+	return false;
 }
 
 /*
diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
index 2880fd392e0c..11945ff42f50 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/arch/x86/kvm/mmu/tdp_iter.h
@@ -105,6 +105,8 @@ struct tdp_iter {
 	int as_id;
 	/* A snapshot of the value at sptep */
 	u64 old_spte;
+	/* True if the walk should only visit shadow-present PTEs. */
+	bool only_shadow_present;
 	/*
 	 * Whether the iterator has a valid state. This will be false if the
 	 * iterator walks off the end of the paging structure.
@@ -122,18 +124,23 @@ struct tdp_iter {
  * Iterates over every SPTE mapping the GFN range [start, end) in a
  * preorder traversal.
  */
-#define for_each_tdp_pte_min_level(iter, root, min_level, start, end) \
-	for (tdp_iter_start(&iter, root, min_level, start); \
-	     iter.valid && iter.gfn < end;		     \
-	     tdp_iter_next(&iter))
+#define for_each_tdp_pte_min_level(iter, root, min_level, only_present, start, end)	\
+	for (tdp_iter_start(&iter, root, min_level, only_present, start);		\
+	     iter.valid && iter.gfn < end;						\
+	     tdp_iter_next(&iter))							\
+		if ((only_present) && !is_shadow_present_pte(iter.old_spte)) {		\
+		} else
+
+#define for_each_shadow_present_tdp_pte(iter, root, min_level, start, end)		\
+	for_each_tdp_pte_min_level(iter, root, min_level, true, start, end)
 
 #define for_each_tdp_pte(iter, root, start, end) \
-	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_4K, start, end)
+	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_4K, false, start, end)
 
 tdp_ptep_t spte_to_child_pt(u64 pte, int level);
 
 void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root,
-		    int min_level, gfn_t next_last_level_gfn);
+		    int min_level, bool only_present, gfn_t next_last_level_gfn);
 void tdp_iter_next(struct tdp_iter *iter);
 void tdp_iter_restart(struct tdp_iter *iter);
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 3b996c1fdaab..25a75db83ca3 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -752,14 +752,11 @@ static void __tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root,
 	gfn_t end = tdp_mmu_max_gfn_exclusive();
 	gfn_t start = 0;
 
-	for_each_tdp_pte_min_level(iter, root, zap_level, start, end) {
+	for_each_shadow_present_tdp_pte(iter, root, zap_level, start, end) {
 retry:
 		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, shared))
 			continue;
 
-		if (!is_shadow_present_pte(iter.old_spte))
-			continue;
-
 		if (iter.level > zap_level)
 			continue;
 
@@ -856,15 +853,14 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root,
 
 	rcu_read_lock();
 
-	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_4K, start, end) {
+	for_each_shadow_present_tdp_pte(iter, root, PG_LEVEL_4K, start, end) {
 		if (can_yield &&
 		    tdp_mmu_iter_cond_resched(kvm, &iter, flush, false)) {
 			flush = false;
 			continue;
 		}
 
-		if (!is_shadow_present_pte(iter.old_spte) ||
-		    !is_last_spte(iter.old_spte, iter.level))
+		if (!is_last_spte(iter.old_spte, iter.level))
 			continue;
 
 		tdp_mmu_iter_set_spte(kvm, &iter, SHADOW_NONPRESENT_VALUE);
@@ -1296,7 +1292,7 @@ static bool wrprot_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
 
 	BUG_ON(min_level > KVM_MAX_HUGEPAGE_LEVEL);
 
-	for_each_tdp_pte_min_level(iter, root, min_level, start, end) {
+	for_each_shadow_present_tdp_pte(iter, root, min_level, start, end) {
 retry:
 		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true))
 			continue;
@@ -1415,12 +1411,12 @@ static int tdp_mmu_split_huge_pages_root(struct kvm *kvm,
 	 * level above the target level (e.g. splitting a 1GB to 512 2MB pages,
 	 * and then splitting each of those to 512 4KB pages).
 	 */
-	for_each_tdp_pte_min_level(iter, root, target_level + 1, start, end) {
+	for_each_shadow_present_tdp_pte(iter, root, target_level + 1, start, end) {
 retry:
 		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, shared))
 			continue;
 
-		if (!is_shadow_present_pte(iter.old_spte) || !is_large_pte(iter.old_spte))
+		if (!is_large_pte(iter.old_spte))
 			continue;
 
 		if (!sp) {
@@ -1626,13 +1622,12 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 
 	rcu_read_lock();
 
-	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_2M, start, end) {
+	for_each_shadow_present_tdp_pte(iter, root, PG_LEVEL_2M, start, end) {
 retry:
 		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true))
 			continue;
 
-		if (iter.level > KVM_MAX_HUGEPAGE_LEVEL ||
-		    !is_shadow_present_pte(iter.old_spte))
+		if (iter.level > KVM_MAX_HUGEPAGE_LEVEL)
 			continue;
 
 		/*
@@ -1696,9 +1691,8 @@ static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root,
 
 	rcu_read_lock();
 
-	for_each_tdp_pte_min_level(iter, root, min_level, gfn, gfn + 1) {
-		if (!is_shadow_present_pte(iter.old_spte) ||
-		    !is_last_spte(iter.old_spte, iter.level))
+	for_each_shadow_present_tdp_pte(iter, root, min_level, gfn, gfn + 1) {
+		if (!is_last_spte(iter.old_spte, iter.level))
 			continue;
 
 		new_spte = iter.old_spte &

base-commit: 8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b
-- 


> Signed-off-by: Bernhard Kauer <bk@alpico.io>
> ---
>  arch/x86/kvm/mmu/tdp_iter.h | 21 +++++++++++++++++++++
>  arch/x86/kvm/mmu/tdp_mmu.c  |  2 +-
>  2 files changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
> index 2880fd392e0c..7ad28ac2c6b8 100644
> --- a/arch/x86/kvm/mmu/tdp_iter.h
> +++ b/arch/x86/kvm/mmu/tdp_iter.h
> @@ -130,6 +130,27 @@ struct tdp_iter {
>  #define for_each_tdp_pte(iter, root, start, end) \
>  	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_4K, start, end)
>  
> +
> +/*
> + * Skip up to count not present entries of the iterator. Returns true
> + * if the final entry is not present.
> + */
> +static inline bool tdp_iter_skip_not_present(struct tdp_iter *iter, int count)
> +{
> +	int i;
> +	int pos;
> +
> +	pos = SPTE_INDEX(iter->gfn << PAGE_SHIFT, iter->level);
> +	count = min(count, SPTE_ENT_PER_PAGE - 1 - pos);
> +	for (i = 0; i < count && !is_shadow_present_pte(iter->old_spte); i++)
> +		iter->old_spte = kvm_tdp_mmu_read_spte(iter->sptep + i + 1);
> +
> +	iter->gfn += i * KVM_PAGES_PER_HPAGE(iter->level);
> +	iter->next_last_level_gfn = iter->gfn;
> +	iter->sptep += i;
> +	return !is_shadow_present_pte(iter->old_spte);
> +}
> +
>  tdp_ptep_t spte_to_child_pt(u64 pte, int level);
>  
>  void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root,
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 1951f76db657..404726511f95 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -750,7 +750,7 @@ static void __tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root,
>  		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, shared))
>  			continue;
>  
> -		if (!is_shadow_present_pte(iter.old_spte))
> +		if (tdp_iter_skip_not_present(&iter, 32))
>  			continue;
>  
>  		if (iter.level > zap_level)
> -- 
> 2.45.2
> 

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] KVM: x86: Fast forward the iterator when zapping the TDP MMU
  2024-10-23 23:27 ` Sean Christopherson
@ 2024-10-24 10:25   ` Bernhard Kauer
  2024-10-28 16:44     ` Sean Christopherson
  0 siblings, 1 reply; 4+ messages in thread
From: Bernhard Kauer @ 2024-10-24 10:25 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Bernhard Kauer, kvm

On Wed, Oct 23, 2024 at 04:27:50PM -0700, Sean Christopherson wrote:
> On Wed, Oct 23, 2024, Bernhard Kauer wrote:
> > Zapping a root means scanning for present entries in a page-table
> > hierarchy. This process is relatively slow since it needs to be
> > preemtible as millions of entries might be processed.
> > 
> > Furthermore the root-page is traversed multiple times as zapping
> > is done with increasing page-sizes.
> > 
> > Optimizing for the not-present case speeds up the hello microbenchmark
> > by 115 microseconds.
> 
> What is the "hello" microbenchmark?  Do we actually care if it's faster?

Hello is a tiny kernel that just outputs "Hello world!" over a virtual
serial port and then shuts the VM down.  It is the minimal test-case that
reveals performance bottlenecks hard to see in the noise of a big system.

Does it matter?  The case I optimized might be only relevant for
short-running virtual machines.  However, you found more users of
the iterator that might benefit from it.

> Are you able to determine exactly what makes iteration slow? 

I've counted the loop and the number of entries removed:

	[24661.896626] zap root(0, 1) loops 3584 entries 2
	[24661.896655] zap root(0, 2) loops 2048 entries 3
	[24661.896709] zap root(0, 3) loops 1024 entries 2
	[24661.896750] zap root(0, 4) loops 512 entries 1
	[24661.896812] zap root(1, 1) loops 512 entries 0
	[24661.896856] zap root(1, 2) loops 512 entries 0
	[24661.896895] zap root(1, 3) loops 512 entries 0
	[24661.896938] zap root(1, 4) loops 512 entries 0

So for this simple case one needs 9216 iterations to go through 18 pagetables
with 512 entries each. My patch reduces this to 303 iterations.

	[24110.032368] zap root(0, 1) loops 118 entries 2
	[24110.032374] zap root(0, 2) loops 69 entries 3
	[24110.032419] zap root(0, 3) loops 35 entries 2
	[24110.032421] zap root(0, 4) loops 17 entries 1
	[24110.032434] zap root(1, 1) loops 16 entries 0
	[24110.032435] zap root(1, 2) loops 16 entries 0
	[24110.032437] zap root(1, 3) loops 16 entries 0
	[24110.032438] zap root(1, 4) loops 16 entries 0

Given the 115 microseconds one loop iteration is roughly 13 nanoseconds. 
With the updates to the iterator and the various checks this sounds
reasonable to me.  Simplifying the inner loop should help here.

> partly because maybe there's a more elegant solution.

Scanning can be avoided if one keeps track of the used entries.

> Regardless of why iteration is slow, I would much prefer to solve this for all
> users of the iterator.  E.g. very lightly tested, and not 100% optimized (though
> should be on par with the below).

Makes sense. I tried it out and it is a bit slower. One can optimize
the while loop in try_side_step() a bit further.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] KVM: x86: Fast forward the iterator when zapping the TDP MMU
  2024-10-24 10:25   ` Bernhard Kauer
@ 2024-10-28 16:44     ` Sean Christopherson
  0 siblings, 0 replies; 4+ messages in thread
From: Sean Christopherson @ 2024-10-28 16:44 UTC (permalink / raw)
  To: Bernhard Kauer; +Cc: kvm

On Thu, Oct 24, 2024, Bernhard Kauer wrote:
> On Wed, Oct 23, 2024 at 04:27:50PM -0700, Sean Christopherson wrote:
> > On Wed, Oct 23, 2024, Bernhard Kauer wrote:
> > > Zapping a root means scanning for present entries in a page-table
> > > hierarchy. This process is relatively slow since it needs to be
> > > preemtible as millions of entries might be processed.
> > > 
> > > Furthermore the root-page is traversed multiple times as zapping
> > > is done with increasing page-sizes.
> > > 
> > > Optimizing for the not-present case speeds up the hello microbenchmark
> > > by 115 microseconds.
> > 
> > What is the "hello" microbenchmark?  Do we actually care if it's faster?
> 
> Hello is a tiny kernel that just outputs "Hello world!" over a virtual
> serial port and then shuts the VM down.  It is the minimal test-case that
> reveals performance bottlenecks hard to see in the noise of a big system.
> 
> Does it matter?

Yes.  Knowing the behavior and use case helps guide jugdment calls, e.g. for
balancing complexity and maintenance burden versus performance, and it also helps
readers understand what types of behaviors/workloads will benefit from the change.

> The case I optimized might be only relevant for short-running virtual
> machines.  However, you found more users of the iterator that might benefit
> from it.
>  
> > Are you able to determine exactly what makes iteration slow? 
> 
> I've counted the loop and the number of entries removed:
> 
> 	[24661.896626] zap root(0, 1) loops 3584 entries 2
> 	[24661.896655] zap root(0, 2) loops 2048 entries 3
> 	[24661.896709] zap root(0, 3) loops 1024 entries 2
> 	[24661.896750] zap root(0, 4) loops 512 entries 1
> 	[24661.896812] zap root(1, 1) loops 512 entries 0
> 	[24661.896856] zap root(1, 2) loops 512 entries 0
> 	[24661.896895] zap root(1, 3) loops 512 entries 0
> 	[24661.896938] zap root(1, 4) loops 512 entries 0
> 
> 
> So for this simple case one needs 9216 iterations to go through 18 pagetables
> with 512 entries each. My patch reduces this to 303 iterations.
> 
> 	[24110.032368] zap root(0, 1) loops 118 entries 2
> 	[24110.032374] zap root(0, 2) loops 69 entries 3
> 	[24110.032419] zap root(0, 3) loops 35 entries 2
> 	[24110.032421] zap root(0, 4) loops 17 entries 1
> 	[24110.032434] zap root(1, 1) loops 16 entries 0
> 	[24110.032435] zap root(1, 2) loops 16 entries 0
> 	[24110.032437] zap root(1, 3) loops 16 entries 0
> 	[24110.032438] zap root(1, 4) loops 16 entries 0
> 
> 
> Given the 115 microseconds one loop iteration is roughly 13 nanoseconds. 
> With the updates to the iterator and the various checks this sounds
> reasonable to me.  Simplifying the inner loop should help here.

Yeah, I was essentialy wondering if we could optimize some of the checks at each
step.  E.g. untested, but I suspect that checking yielded_gfn to ensure forward
progress if and only if a resched is needed would improve overall throughput by
short circuiting on the common case (no resched needed), and by making that path
more predictable (returning false instead of iter->yielded).

---
 arch/x86/kvm/mmu/tdp_mmu.c | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 25a75db83ca3..15be07fcc5f9 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -706,31 +706,31 @@ static inline bool __must_check tdp_mmu_iter_cond_resched(struct kvm *kvm,
 							  struct tdp_iter *iter,
 							  bool flush, bool shared)
 {
-	WARN_ON_ONCE(iter->yielded);
+	KVM_MMU_WARN_ON(iter->yielded);
+
+	if (!need_resched() && !rwlock_needbreak(&kvm->mmu_lock))
+		return false;
 
 	/* Ensure forward progress has been made before yielding. */
 	if (iter->next_last_level_gfn == iter->yielded_gfn)
 		return false;
 
-	if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) {
-		if (flush)
-			kvm_flush_remote_tlbs(kvm);
+	if (flush)
+		kvm_flush_remote_tlbs(kvm);
 
-		rcu_read_unlock();
+	rcu_read_unlock();
 
-		if (shared)
-			cond_resched_rwlock_read(&kvm->mmu_lock);
-		else
-			cond_resched_rwlock_write(&kvm->mmu_lock);
+	if (shared)
+		cond_resched_rwlock_read(&kvm->mmu_lock);
+	else
+		cond_resched_rwlock_write(&kvm->mmu_lock);
 
-		rcu_read_lock();
+	rcu_read_lock();
 
-		WARN_ON_ONCE(iter->gfn > iter->next_last_level_gfn);
+	KVM_MMU_WARN_ON(iter->gfn > iter->next_last_level_gfn);
 
-		iter->yielded = true;
-	}
-
-	return iter->yielded;
+	iter->yielded = true;
+	return true;
 }
 
 static inline gfn_t tdp_mmu_max_gfn_exclusive(void)

base-commit: 80fef183d5a6283e50a50c2aff20dfb415366305
-- 

> > partly because maybe there's a more elegant solution.
> 
> Scanning can be avoided if one keeps track of the used entries.

Yeah, which is partly why I asked about the details of the benchmark.  Tracking
used entries outside of the page tables themselves adds complexity and likely
increases the latency of insertion operations.

> > Regardless of why iteration is slow, I would much prefer to solve this for all
> > users of the iterator.  E.g. very lightly tested, and not 100% optimized (though
> > should be on par with the below).
> 
> Makes sense. I tried it out and it is a bit slower. One can optimize
> the while loop in try_side_step() a bit further.

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-10-28 16:44 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-23  9:18 [PATCH] KVM: x86: Fast forward the iterator when zapping the TDP MMU Bernhard Kauer
2024-10-23 23:27 ` Sean Christopherson
2024-10-24 10:25   ` Bernhard Kauer
2024-10-28 16:44     ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox