[PATCH v1 0/2] Optimize S2 page splitting

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v1 0/2] Optimize S2 page splitting
@ 2026-06-10 20:21 Leonardo Bras
  2026-06-10 20:21 ` [PATCH v1 1/2] KVM: arm64: Introduce KVM_PGTABLE_WALK_SKIP_LEVEL* walk flags Leonardo Bras
  2026-06-10 20:21 ` [PATCH v1 2/2] KVM: arm64: Make stage2_split_walker() skip unnecessary walks Leonardo Bras
  0 siblings, 2 replies; 5+ messages in thread
From: Leonardo Bras @ 2026-06-10 20:21 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Joey Gouly, Steffen Eiden,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon,
	Fuad Tabba, Leonardo Bras, Raghavendra Rao Ananta
  Cc: linux-arm-kernel, kvmarm, linux-kernel

While playing with dirty-bit tracking, I decided to take a look on how page
splitting works. Found out all entries are walked, even though we can infer,
for instance that:
- If a level-3 entry is walked, it means the parent level-2 entry is split
- If a split just succeeded in an table entry, it means all children nodes
  are already split

This patches' idea is to introduce new walking flags to skip pagetable 
levels 0-3. 

The idea of skipping child nodes was also tested, but it was marginally 
slower than just skipping levels, so it was discarted. 

Optimization measured on two scenarios involving eager-splitting on a
VM with 1 memslot of 64GB:
- Scenario 1: No manual protect, whole memslot split at dirty-track enable
  (KVM_SET_USER_MEMORY_REGION2 ioctl with KVM_MEM_LOG_DIRTY_PAGES)
  - Split happens only once, whole region
  - Evalutes improved batch performance of splitting
- Scenario 2: Manual protect, split happens during every dirty-bit clean
  (KVM_CLEAR_DIRTY_LOG ioctl), average for 2 iterations.
  - Split called multiple times, for smaller 64-page sections.
  - Evaluate improved performance for multiple calls

Scenario 1, improvement on dirty-track enable ioctl for the memslot:
- Memory was already split (4k pages):  -35.47% runtime (stdev 5.63%)
- THP backed memory:                    -11.94% runtime (stdev 2.55%)
- 64x1GB hugetlb memory:                -14.46% runtime (stdev 2.68%)

Scenario 2, improvement on dirty-log clean ioctl for the memslot:
- Memory was already split (4k pages):  -26.36% runtime (stdev 3.32%)
- THP backed memory:                    -12.05% runtime (stdev 0.37%)
- 64x1GB hugetlb memory:                -13.87% runtime (stdev 0.86%)

For collecting above numbers, the following script was ran in both vanilla 
and patched kernels, with kernel parameter 'default_hugepagesz=1G', on an 
AmpereOne with 256GB RAM.

--- dirty_test.sh
#!/bin/bash
filename=$(uname -r |cut -d'-' -f 4-)

run_test(){
  uname -a
  cat /proc/cmdline

  #prepare
  sudo bash -c 'echo 64 > /proc/sys/vm/nr_hugepages'

  ./dirty_log_perf_test -g -b 64G
  ./dirty_log_perf_test -g -b 64G -s anonymous_thp
  ./dirty_log_perf_test -g -b 64G -s shared_hugetlb

  ./dirty_log_perf_test -b 64G
  ./dirty_log_perf_test -b 64G -s anonymous_thp
  ./dirty_log_perf_test -b 64G -s shared_hugetlb
}

run_test 2>&1 | tee ${filename}
---

Above dirty_log_perf_test command is the standard kvm selftest found in the 
kernel tree. It tested the following guest modes:
Testing guest mode: PA-bits:48,  VA-bits:48,  4K pages
Testing guest mode: PA-bits:48,  VA-bits:48, 16K pages
Testing guest mode: PA-bits:48,  VA-bits:48, 64K pages
Testing guest mode: PA-bits:40,  VA-bits:48,  4K pages
Testing guest mode: PA-bits:40,  VA-bits:48, 16K pages
Testing guest mode: PA-bits:40,  VA-bits:48, 64K pages

Performance numbers from above modes were used to calculate average and 
stdev showed in the optimization results.

Changes since v1:
- Changed approach from return value to walk flags (Will Deacon)
- Discarted skip_child approach (Oliver Upton)
- Measured in real hardware, and from userspace perspective (Marc Zyngier)
- Better explanation of what and how numbers were collected
v1 Link: https://lore.kernel.org/all/20260515195904.2466381-1-leo.bras@arm.com/

Thanks!
Leo

Leonardo Bras (2):
  KVM: arm64: Introduce KVM_PGTABLE_WALK_SKIP_LEVEL* walk flags
  KVM: arm64: Make stage2_split_walker() skip unnecessary walks

 arch/arm64/include/asm/kvm_pgtable.h | 13 +++++++++++++
 arch/arm64/kvm/hyp/pgtable.c         | 18 ++++++++++++++++--
 2 files changed, 29 insertions(+), 2 deletions(-)


base-commit: acb7500801e98639f6d8c2d796ed9f64cba83d3a
-- 
2.54.0



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v1 1/2] KVM: arm64: Introduce KVM_PGTABLE_WALK_SKIP_LEVEL* walk flags
  2026-06-10 20:21 [PATCH v1 0/2] Optimize S2 page splitting Leonardo Bras
@ 2026-06-10 20:21 ` Leonardo Bras
  2026-06-10 20:30   ` sashiko-bot
  2026-06-10 20:21 ` [PATCH v1 2/2] KVM: arm64: Make stage2_split_walker() skip unnecessary walks Leonardo Bras
  1 sibling, 1 reply; 5+ messages in thread
From: Leonardo Bras @ 2026-06-10 20:21 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Joey Gouly, Steffen Eiden,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon,
	Fuad Tabba, Leonardo Bras, Raghavendra Rao Ananta
  Cc: linux-arm-kernel, kvmarm, linux-kernel

Add the new walking flags that tell kvm_pgtable_walk() to skip lower levels
when walking the pagetables.

Signed-off-by: Leonardo Bras <leo.bras@arm.com>
---
 arch/arm64/include/asm/kvm_pgtable.h | 13 +++++++++++++
 arch/arm64/kvm/hyp/pgtable.c         | 15 ++++++++++++++-
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 41a8687938eb..20c7c12e0e76 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -311,31 +311,44 @@ typedef bool (*kvm_pgtable_force_pte_cb_t)(u64 addr, u64 end,
  * @KVM_PGTABLE_WALK_SHARED:		Indicates the page-tables may be shared
  *					with other software walkers.
  * @KVM_PGTABLE_WALK_IGNORE_EAGAIN:	Don't terminate the walk early if
  *					the walker returns -EAGAIN.
  * @KVM_PGTABLE_WALK_SKIP_BBM_TLBI:	Visit and update table entries
  *					without Break-before-make's
  *					TLB invalidation.
  * @KVM_PGTABLE_WALK_SKIP_CMO:		Visit and update table entries
  *					without Cache maintenance
  *					operations required.
+ * @KVM_PGTABLE_WALK_SKIP_LEVEL0:	Skip visiting level-0+ entries
+ * @KVM_PGTABLE_WALK_SKIP_LEVEL1:	Skip visiting level-1+ entries
+ * @KVM_PGTABLE_WALK_SKIP_LEVEL2:	Skip visiting level-2+ entries
+ * @KVM_PGTABLE_WALK_SKIP_LEVEL3:	Skip visiting level-3 entries
  */
 enum kvm_pgtable_walk_flags {
 	KVM_PGTABLE_WALK_LEAF			= BIT(0),
 	KVM_PGTABLE_WALK_TABLE_PRE		= BIT(1),
 	KVM_PGTABLE_WALK_TABLE_POST		= BIT(2),
 	KVM_PGTABLE_WALK_SHARED			= BIT(3),
 	KVM_PGTABLE_WALK_IGNORE_EAGAIN		= BIT(4),
 	KVM_PGTABLE_WALK_SKIP_BBM_TLBI		= BIT(5),
 	KVM_PGTABLE_WALK_SKIP_CMO		= BIT(6),
+	KVM_PGTABLE_WALK_SKIP_LEVEL0		= BIT(7),
+	KVM_PGTABLE_WALK_SKIP_LEVEL1		= BIT(8),
+	KVM_PGTABLE_WALK_SKIP_LEVEL2		= BIT(9),
+	KVM_PGTABLE_WALK_SKIP_LEVEL3		= BIT(10),
 };
 
+#define KVM_PGTABLE_WALK_SKIP_LEVELS 	(KVM_PGTABLE_WALK_SKIP_LEVEL0 | \
+					 KVM_PGTABLE_WALK_SKIP_LEVEL1 | \
+					 KVM_PGTABLE_WALK_SKIP_LEVEL2 | \
+					 KVM_PGTABLE_WALK_SKIP_LEVEL3 )
+
 struct kvm_pgtable_visit_ctx {
 	kvm_pte_t				*ptep;
 	kvm_pte_t				old;
 	void					*arg;
 	struct kvm_pgtable_mm_ops		*mm_ops;
 	u64					start;
 	u64					addr;
 	u64					end;
 	s8					level;
 	enum kvm_pgtable_walk_flags		flags;
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 91a7dfad6686..48d88a290a53 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -137,20 +137,33 @@ static bool kvm_pgtable_walk_continue(const struct kvm_pgtable_walker *walker,
 	 * Ignore the return code altogether for walkers outside a fault handler
 	 * (e.g. write protecting a range of memory) and chug along with the
 	 * page table walk.
 	 */
 	if (r == -EAGAIN)
 		return walker->flags & KVM_PGTABLE_WALK_IGNORE_EAGAIN;
 
 	return !r;
 }
 
+static __always_inline bool kvm_pgtable_skip_level(s8 level, enum kvm_pgtable_walk_flags flags)
+{
+	flags &= KVM_PGTABLE_WALK_SKIP_LEVELS;
+
+	if (likely(!flags))
+		return false;
+
+	if (level >= (fls(flags) - ffs(KVM_PGTABLE_WALK_SKIP_LEVELS)))
+		return true;
+
+	return false;
+}
+
 static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
 			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, s8 level);
 
 static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 				      struct kvm_pgtable_mm_ops *mm_ops,
 				      kvm_pteref_t pteref, s8 level)
 {
 	enum kvm_pgtable_walk_flags flags = data->walker->flags;
 	kvm_pte_t *ptep = kvm_dereference_pteref(data->walker, pteref);
 	struct kvm_pgtable_visit_ctx ctx = {
@@ -185,21 +198,21 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 	 * into a newly installed or replaced table.
 	 */
 	if (reload) {
 		ctx.old = READ_ONCE(*ptep);
 		table = kvm_pte_table(ctx.old, level);
 	}
 
 	if (!kvm_pgtable_walk_continue(data->walker, ret))
 		goto out;
 
-	if (!table) {
+	if (!table || kvm_pgtable_skip_level(level + 1, ctx.flags)) {
 		data->addr = ALIGN_DOWN(data->addr, kvm_granule_size(level));
 		data->addr += kvm_granule_size(level);
 		goto out;
 	}
 
 	childp = (kvm_pteref_t)kvm_pte_follow(ctx.old, mm_ops);
 	ret = __kvm_pgtable_walk(data, mm_ops, childp, level + 1);
 	if (!kvm_pgtable_walk_continue(data->walker, ret))
 		goto out;
 
-- 
2.54.0



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v1 1/2] KVM: arm64: Introduce KVM_PGTABLE_WALK_SKIP_LEVEL* walk flags
  2026-06-10 20:21 ` [PATCH v1 1/2] KVM: arm64: Introduce KVM_PGTABLE_WALK_SKIP_LEVEL* walk flags Leonardo Bras
@ 2026-06-10 20:30   ` sashiko-bot
  2026-06-17 13:30     ` Leonardo Bras
  0 siblings, 1 reply; 5+ messages in thread
From: sashiko-bot @ 2026-06-10 20:30 UTC (permalink / raw)
  To: Leonardo Bras; +Cc: Oliver Upton, Marc Zyngier, kvmarm

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [Medium] Combining skip flags causes the walker to ignore the more restrictive skip levels due to fls() usage.
- [Medium] Skipping a page table level bypasses the KVM_PGTABLE_WALK_TABLE_POST callback, breaking PRE/POST callback symmetry.
- [Low] The skip logic cannot prevent the root level from being visited, making SKIP_LEVEL0 ineffective when start_level == 0.
--

commit 4e6415e54e22d985a0c97622b8e8b28b4c84b54b
Author: Leonardo Bras <leo.bras@arm.com>

KVM: arm64: Introduce KVM_PGTABLE_WALK_SKIP_LEVEL* walk flags

Add the new walking flags that tell kvm_pgtable_walk() to skip lower levels
when walking the pagetables.

> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 91a7dfad66866..48d88a290a538 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c

[ ... ]

> @@ -144,6 +144,19 @@ static bool kvm_pgtable_walk_continue(const struct kvm_pgtable_walker *walker,
>  	return !r;
>  }
>  
> +static __always_inline bool kvm_pgtable_skip_level(s8 level, enum kvm_pgtable_walk_flags flags)
> +{
> +	flags &= KVM_PGTABLE_WALK_SKIP_LEVELS;
> +
> +	if (likely(!flags))
> +		return false;
> +
> +	if (level >= (fls(flags) - ffs(KVM_PGTABLE_WALK_SKIP_LEVELS)))

[Severity: Medium]
Will this evaluate to the highest level if multiple skip flags are ORed
together?

If a caller combines flags like KVM_PGTABLE_WALK_SKIP_LEVEL1 and
KVM_PGTABLE_WALK_SKIP_LEVEL2, fls() will select the highest set bit. This
means the threshold evaluates to level 2, effectively ignoring the more
restrictive SKIP_LEVEL1 flag and incorrectly descending into levels the
caller requested to skip.

Should kvm_pgtable_skip_level() use ffs() instead of fls() to select the
lowest requested level as the definitive stopping point?

> +		return true;
> +
> +	return false;
> +}

[ ... ]

> @@ -192,7 +205,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>  	if (!kvm_pgtable_walk_continue(data->walker, ret))
>  		goto out;
>  
> -	if (!table) {
> +	if (!table || kvm_pgtable_skip_level(level + 1, ctx.flags)) {

[Severity: Medium]
Does skipping the level here bypass the KVM_PGTABLE_WALK_TABLE_POST callback?

If the current entry is a table, the KVM_PGTABLE_WALK_TABLE_PRE callback is
executed earlier in __kvm_pgtable_visit(). If kvm_pgtable_skip_level() then
evaluates to true, the code executes an early exit via goto out.

This completely bypasses the KVM_PGTABLE_WALK_TABLE_POST callback at the
end of the function, leaving the TABLE_PRE call unbalanced. If a walker
relies on symmetric PRE and POST callbacks to clean up resources or track
traversal depth, could this asymmetric execution flow lead to resource leaks
or state inconsistency?

[Severity: Low]
Does checking level + 1 prevent the root level from being skipped?

The walker unconditionally calls __kvm_pgtable_visit() for entries at the
root page table (start_level). Because the skip evaluation checks level + 1,
it only governs whether the walker descends into the children of the current
level.

If start_level == 0 and the walker requests SKIP_LEVEL0, won't the walker
still visit level 0 and invoke its callbacks, making SKIP_LEVEL0 functionally
identical to SKIP_LEVEL1 in this case?

>  		data->addr = ALIGN_DOWN(data->addr, kvm_granule_size(level));
>  		data->addr += kvm_granule_size(level);
>  		goto out;

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260610202112.2695205-2-leo.bras@arm.com?part=1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v1 1/2] KVM: arm64: Introduce KVM_PGTABLE_WALK_SKIP_LEVEL* walk flags
  2026-06-10 20:30   ` sashiko-bot
@ 2026-06-17 13:30     ` Leonardo Bras
  0 siblings, 0 replies; 5+ messages in thread
From: Leonardo Bras @ 2026-06-17 13:30 UTC (permalink / raw)
  To: kvmarm; +Cc: Leonardo Bras, Oliver Upton, Marc Zyngier

On Wed, Jun 10, 2026 at 08:30:42PM +0000, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
> - [Medium] Combining skip flags causes the walker to ignore the more restrictive skip levels due to fls() usage.
> - [Medium] Skipping a page table level bypasses the KVM_PGTABLE_WALK_TABLE_POST callback, breaking PRE/POST callback symmetry.
> - [Low] The skip logic cannot prevent the root level from being visited, making SKIP_LEVEL0 ineffective when start_level == 0.
> --
> 
> commit 4e6415e54e22d985a0c97622b8e8b28b4c84b54b
> Author: Leonardo Bras <leo.bras@arm.com>
> 
> KVM: arm64: Introduce KVM_PGTABLE_WALK_SKIP_LEVEL* walk flags
> 
> Add the new walking flags that tell kvm_pgtable_walk() to skip lower levels
> when walking the pagetables.
> 
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index 91a7dfad66866..48d88a290a538 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> 
> [ ... ]
> 
> > @@ -144,6 +144,19 @@ static bool kvm_pgtable_walk_continue(const struct kvm_pgtable_walker *walker,
> >  	return !r;
> >  }
> >  
> > +static __always_inline bool kvm_pgtable_skip_level(s8 level, enum kvm_pgtable_walk_flags flags)
> > +{
> > +	flags &= KVM_PGTABLE_WALK_SKIP_LEVELS;
> > +
> > +	if (likely(!flags))
> > +		return false;
> > +
> > +	if (level >= (fls(flags) - ffs(KVM_PGTABLE_WALK_SKIP_LEVELS)))
> 
> [Severity: Medium]
> Will this evaluate to the highest level if multiple skip flags are ORed
> together?
> 
> If a caller combines flags like KVM_PGTABLE_WALK_SKIP_LEVEL1 and
> KVM_PGTABLE_WALK_SKIP_LEVEL2, fls() will select the highest set bit. This
> means the threshold evaluates to level 2, effectively ignoring the more
> restrictive SKIP_LEVEL1 flag and incorrectly descending into levels the
> caller requested to skip.
> 
> Should kvm_pgtable_skip_level() use ffs() instead of fls() to select the
> lowest requested level as the definitive stopping point?
> 

Correct. We should have s/fls/ffs/ in that line, so:
+   if (level >= (ffs(flags) - ffs(KVM_PGTABLE_WALK_SKIP_LEVELS)))

That will select the lowest level, instead of the highest.

Since the tests were done only with a single skiplevel selected, ffs and 
fls return the same bit, so performance results should not be affected. 

> > +		return true;
> > +
> > +	return false;
> > +}
> 
> [ ... ]
> 
> > @@ -192,7 +205,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
> >  	if (!kvm_pgtable_walk_continue(data->walker, ret))
> >  		goto out;
> >  
> > -	if (!table) {
> > +	if (!table || kvm_pgtable_skip_level(level + 1, ctx.flags)) {
> 
> [Severity: Medium]
> Does skipping the level here bypass the KVM_PGTABLE_WALK_TABLE_POST callback?
> 
> If the current entry is a table, the KVM_PGTABLE_WALK_TABLE_PRE callback is
> executed earlier in __kvm_pgtable_visit(). If kvm_pgtable_skip_level() then
> evaluates to true, the code executes an early exit via goto out.
> 
> This completely bypasses the KVM_PGTABLE_WALK_TABLE_POST callback at the
> end of the function, leaving the TABLE_PRE call unbalanced. If a walker
> relies on symmetric PRE and POST callbacks to clean up resources or track
> traversal depth, could this asymmetric execution flow lead to resource leaks
> or state inconsistency?

Correct. PRE and POST are called only on table entries, and in the case 
skip levels is enabled it will completelly skip that post, which is 
undesired.

To fix that, I propose the following:
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -209,24 +209,24 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
        childp = (kvm_pteref_t)kvm_pte_follow(ctx.old, mm_ops);
        ret = __kvm_pgtable_walk(data, mm_ops, childp, level + 1);
        if (!kvm_pgtable_walk_continue(data->walker, ret))
                goto out;

-       if (ctx.flags & KVM_PGTABLE_WALK_TABLE_POST)
+out:
+       if (table && ctx.flags & KVM_PGTABLE_WALK_TABLE_POST)
                ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_TABLE_POST);

-out:
---

I suppose the extra test will be optimized well by the compiler, and we 
should see not much difference in performance.

Only other alternative I can see would be to add an extra label, and a copy 
of the increasing data->addr.

> 
> [Severity: Low]
> Does checking level + 1 prevent the root level from being skipped?
> 
> The walker unconditionally calls __kvm_pgtable_visit() for entries at the
> root page table (start_level). Because the skip evaluation checks level + 1,
> it only governs whether the walker descends into the children of the current
> level.
> 
> If start_level == 0 and the walker requests SKIP_LEVEL0, won't the walker
> still visit level 0 and invoke its callbacks, making SKIP_LEVEL0 functionally
> identical to SKIP_LEVEL1 in this case?

Correct. But I don't see the usefulness of this being different.
SKIP_LEVEL0 is supposed to be useful if the root level is -1.
What would be a use case of having a flag to skip the root level? Why would 
the pagetable walk be called, then?


> 
> >  		data->addr = ALIGN_DOWN(data->addr, kvm_granule_size(level));
> >  		data->addr += kvm_granule_size(level);
> >  		goto out;
> 
> -- 
> Sashiko AI review · https://sashiko.dev/#/patchset/20260610202112.2695205-2-leo.bras@arm.com?part=1

Thanks!
Leo


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v1 2/2] KVM: arm64: Make stage2_split_walker() skip unnecessary walks
  2026-06-10 20:21 [PATCH v1 0/2] Optimize S2 page splitting Leonardo Bras
  2026-06-10 20:21 ` [PATCH v1 1/2] KVM: arm64: Introduce KVM_PGTABLE_WALK_SKIP_LEVEL* walk flags Leonardo Bras
@ 2026-06-10 20:21 ` Leonardo Bras
  1 sibling, 0 replies; 5+ messages in thread
From: Leonardo Bras @ 2026-06-10 20:21 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Joey Gouly, Steffen Eiden,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon,
	Fuad Tabba, Leonardo Bras, Raghavendra Rao Ananta
  Cc: linux-arm-kernel, kvmarm, linux-kernel

Currently, when splitting a hugepage, all it's child and sibling nodes
will be walked, with the walker just returning earlier if there is nothing
to do. This means all pagetable entries in the splitting range get a
callback from the walker function, even if it was a level-3 entry.

Optimize splitting by skipping all level-3 entries, as they are already the
smallest block size and can't be split any further.
(i.e. set flag KVM_PGTABLE_WALK_SKIP_LEVEL3)

Optimization measured on two scenarios involving eager-splitting on a
VM with 1 memslot of 64GB:
- Scenario 1: No manual protect, whole memslot split at dirty-track enable
  (KVM_SET_USER_MEMORY_REGION2 ioctl with KVM_MEM_LOG_DIRTY_PAGES)
- Scenario 2: Manual protect, split happens during dirty-bit clean
  (KVM_CLEAR_DIRTY_LOG ioctl), average for 2 iterations.

Scenario 1, improvement on dirty-track enable for the memslot:
- Memory was already split (4k pages):  -35.47% runtime
- THP backed memory:                    -11.94% runtime
- 64x1GB hugetlb memory:                -14.46% runtime

Scenario 2, improvement on dirty-log clean for the memslot:
- Memory was already split (4k pages):  -26.36% runtime
- THP backed memory:                    -12.05% runtime
- 64x1GB hugetlb memory:                -13.87% runtime

Signed-off-by: Leonardo Bras <leo.bras@arm.com>
---
 arch/arm64/kvm/hyp/pgtable.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 48d88a290a53..70103934a04a 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1565,21 +1565,22 @@ static int stage2_split_walker(const struct kvm_pgtable_visit_ctx *ctx,
 	new = kvm_init_table_pte(childp, mm_ops);
 	stage2_make_pte(ctx, new);
 	return 0;
 }
 
 int kvm_pgtable_stage2_split(struct kvm_pgtable *pgt, u64 addr, u64 size,
 			     struct kvm_mmu_memory_cache *mc)
 {
 	struct kvm_pgtable_walker walker = {
 		.cb	= stage2_split_walker,
-		.flags	= KVM_PGTABLE_WALK_LEAF,
+		.flags	= KVM_PGTABLE_WALK_LEAF |
+			  KVM_PGTABLE_WALK_SKIP_LEVEL3,
 		.arg	= mc,
 	};
 	int ret;
 
 	ret = kvm_pgtable_walk(pgt, addr, size, &walker);
 	dsb(ishst);
 	return ret;
 }
 
 int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
-- 
2.54.0



^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-06-17 13:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-10 20:21 [PATCH v1 0/2] Optimize S2 page splitting Leonardo Bras
2026-06-10 20:21 ` [PATCH v1 1/2] KVM: arm64: Introduce KVM_PGTABLE_WALK_SKIP_LEVEL* walk flags Leonardo Bras
2026-06-10 20:30   ` sashiko-bot
2026-06-17 13:30     ` Leonardo Bras
2026-06-10 20:21 ` [PATCH v1 2/2] KVM: arm64: Make stage2_split_walker() skip unnecessary walks Leonardo Bras

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.