Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] Optimize S2 page splitting
@ 2026-06-18 13:14 Leonardo Bras
  2026-06-18 13:14 ` [PATCH v2 1/3] KVM: arm64: Avoid re-testing walk_continue Leonardo Bras
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Leonardo Bras @ 2026-06-18 13:14 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Joey Gouly, Steffen Eiden,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon,
	Fuad Tabba, Leonardo Bras, Raghavendra Rao Ananta
  Cc: linux-arm-kernel, kvmarm, linux-kernel

While playing with dirty-bit tracking, I decided to take a look on how page
splitting works. Found out all entries are walked, even though we can infer,
for instance that:
- If a level-3 entry is walked, it means the parent level-2 entry is split
- If a split just succeeded in an table entry, it means all children nodes
  are already split

This patches' idea is to introduce new walking flags to skip pagetable
levels 0-3.

The idea of skipping child nodes was also tested, but it was marginally
slower than just skipping levels, so it was discarted.

Optimization measured on two scenarios involving eager-splitting on a
VM with 1 memslot of 16GB:
- Scenario 1: No manual protect, whole memslot split at dirty-track enable
  (KVM_SET_USER_MEMORY_REGION2 ioctl with KVM_MEM_LOG_DIRTY_PAGES)
  - Split happens only once, whole region
  - Evalutes improved batch performance of splitting
- Scenario 2: Manual protect, split happens during every dirty-bit clean
  (KVM_CLEAR_DIRTY_LOG ioctl), average for 2 iterations.
  - Split called multiple times, for smaller 64-page sections.
  - Evaluate improved performance for multiple calls

Scenario 1, improvement on dirty-track enable ioctl for the memslot:
- Memory was already split (4k pages):  -44.01% runtime (stdev 2.80%)
- THP backed memory:                    -24.66% runtime (stdev 1.21%)
- 16x1GB hugetlb memory:                -24.78% runtime (stdev 0.85%)

Scenario 2, improvement on dirty-log clean ioctl for the memslot:
- Memory was already split (4k pages):  -38.98% runtime (stdev 1.91%)
- THP backed memory:                    -25.49% runtime (stdev 0.65%)
- 16x1GB hugetlb memory:                -24.24% runtime (stdev 0.65%)

For collecting above numbers, the following script was ran in both vanilla
and patched kernels, with kernel parameter 'default_hugepagesz=1G', on an
TX2 with 32GB RAM.

--- dirty_test.sh
#!/bin/bash
filename=$(uname -r |cut -d'-' -f 4-)

run_test(){
  uname -a
  cat /proc/cmdline

  #prepare
  sudo bash -c 'echo 64 > /proc/sys/vm/nr_hugepages'

  ./dirty_log_perf_test -g -b 64G
  ./dirty_log_perf_test -g -b 64G -s anonymous_thp
  ./dirty_log_perf_test -g -b 64G -s shared_hugetlb

  ./dirty_log_perf_test -b 64G
  ./dirty_log_perf_test -b 64G -s anonymous_thp
  ./dirty_log_perf_test -b 64G -s shared_hugetlb
}

run_test 2>&1 | tee ${filename}
---

Above dirty_log_perf_test command is the standard kvm selftest found in the
kernel tree. It tested the following guest modes:
Testing guest mode: PA-bits:40,  VA-bits:48,  4K pages
Testing guest mode: PA-bits:40,  VA-bits:48, 64K pages
Testing guest mode: PA-bits:36,  VA-bits:48,  4K pages
Testing guest mode: PA-bits:36,  VA-bits:48, 64K pages

Performance numbers from above modes were used to calculate average and
stdev showed in the optimization results.

Changes since v1:
- Fixed inverted flag verification priority (Sashiko)
- Fixed incorrectly skipping POST call if level was skipped (Sashiko), and to that
- New pre-patch that changes goto-out -> return to avoid re-testing walk_continue 
v1 Link: https://lore.kernel.org/lkml/20260610202112.2695205-2-leo.bras@arm.com/

Changes since RFC:
- Changed approach from return value to walk flags (Will Deacon)
- Discarted skip_child approach (Oliver Upton)
- Measured in real hardware, and from userspace perspective (Marc Zyngier)
- Better explanation of what and how numbers were collected
RFC Link: https://lore.kernel.org/all/20260515195904.2466381-1-leo.bras@arm.com/

Thanks!
Leo

Leonardo Bras (3):
  KVM: arm64: Avoid re-testing walk_continue
  KVM: arm64: Introduce KVM_PGTABLE_WALK_SKIP_LEVEL* walk flags
  KVM: arm64: Make stage2_split_walker() skip unnecessary walks

 arch/arm64/include/asm/kvm_pgtable.h | 13 +++++++++++++
 arch/arm64/kvm/hyp/pgtable.c         | 28 +++++++++++++++++++++-------
 2 files changed, 34 insertions(+), 7 deletions(-)


base-commit: 66affa37cfac0aec061cc4bcf4a065b0c52f7e19
-- 
2.54.0



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-06-18 14:39 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-18 13:14 [PATCH v2 0/3] Optimize S2 page splitting Leonardo Bras
2026-06-18 13:14 ` [PATCH v2 1/3] KVM: arm64: Avoid re-testing walk_continue Leonardo Bras
2026-06-18 13:14 ` [PATCH v2 2/3] KVM: arm64: Introduce KVM_PGTABLE_WALK_SKIP_LEVEL* walk flags Leonardo Bras
2026-06-18 13:14 ` [PATCH v2 3/3] KVM: arm64: Make stage2_split_walker() skip unnecessary walks Leonardo Bras
2026-06-18 14:38 ` [PATCH v2 0/3] Optimize S2 page splitting Leonardo Bras

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox