Kernel KVM virtualization development
 help / color / mirror / Atom feed
* [PATCH v2 00/13] KVM Dirty-bit cleaning hw accelerator (HACDBS)
@ 2026-06-29 11:17 Leonardo Bras
  2026-06-29 11:17 ` [PATCH v2 01/13] KVM: arm64: HDBSS bits Leonardo Bras
                   ` (12 more replies)
  0 siblings, 13 replies; 47+ messages in thread
From: Leonardo Bras @ 2026-06-29 11:17 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Steffen Eiden, Suzuki K Poulose, Zenghui Yu,
	Rafael J. Wysocki, Len Brown, Saket Dumbre, Paolo Bonzini,
	Jonathan Cameron, Chengwen Feng, Leonardo Bras, Kees Cook,
	Mikołaj Lenczewski, James Morse, Zeng Heng, mrigendrachaubey,
	Thomas Huth, Ryan Roberts, Yeoreum Yun, Mark Brown, Kevin Brodsky,
	James Clark, Fuad Tabba, Raghavendra Rao Ananta,
	Lorenzo Pieralisi, Sascha Bischoff, Anshuman Khandual, Tian Zheng
  Cc: linux-arm-kernel, linux-kernel, kvmarm, linux-acpi, acpica-devel,
	kvm


Disclaimer: While this patchset is buildable and testable on it's own,
it is not ready to be merged, as it depends on bits from another
patchset that will superseed the one in [0], and will enable HDBSS.
See note below on patches 1,2.

My expectation on sharing this earlier is to gather feedback on
implementation and testing methods, to make sure it's sure it's ready
when [0] becomes ready.

===

Create an arch-generic dirty-bit cleaning acceleration interface, which
compiles-out if the arch does not implement it, and creates no new API.
Using that, implement an arm64 accelerator based on HACDBS.

This implementation is able to accelerate the cleaning on both
dirty-bitmap and dirty-ring tracking mechanisms on KVM.

Patch 1 & 2 are here just to make this testable, as this patchset
depends on bits from HDBSS that are not upstream yet.

Patch 2 should be included in the HDBSS patchset, and patch 1
is a bunch of bits that I collected across other patches so this can
work. So few free to ignore them on review, as they should be reviewed
in the HDBSS patchset.

To be able to properly use HACDBS, it requires a PPI IRQ that triggers
either on error, or when processing is complete. It's called
HACDBSIRQ, and there is currently no upstream way of announcing it on
ACPI tables, so this patchset uses the suggested table/index in [1],
which at the moment is not merged.

Kernel v7.2-rc1 + this patchset builds properly, passing both kvm selftests
for dirty-bit tracking[2] and a qemu live migration test, with both
HW HACDBS enabled or disabled.

On terms of performance improvement, tests were done using
dirty_log_perf_test[3] to measure the time spent on the following ioctl:
a - KVM_GET_DIRTY_LOG,     using dirty-bitmap without manual protect
    command:    ./dirty_log_perf_test -m 3 -m 6 -m 12 -g
b - KVM_CLEAR_DIRTY_LOG,   using dirty-bitmap with manual protect
    command:    ./dirty_log_perf_test -m 3 -m 6 -m 12
c - KVM_RESET_DIRTY_RINGS, using dirty-ring, using 4096 entries/vcpu.
    command:    ./dirty_log_perf_test -m 3 -m 6 -m 12 -d 4096

Tests ran in the model show that runtime was reduced by:
-(a) 82.19% (0.16% stdev) for 1GB memory, and
     82.45% (0.02% stdev) for 3GB memory
-(b) 81.74% (0.19% stdev) for 1GB memory, and
     82.40% (0.38% stdev) for 3GB memory
-(b) 70.90% (0.18% stdev) for 1GB memory, and
     70.92% (0.01% stdev) for 3GB memory

Above numbers already take into account the improvements in S2
hugepage-splitting that is implemented by [4].

Please let me know of any question :)

Thanks for reviewing!
Leo


Changes since v1:
- Improvements on splitting with manual protect, skipping when cleaning
  pages from the same level-2 entry. (new patch)
- Got the correct concept of chunk_size and thus:
  - Corrected it to a reasonable chunk to do page splitting before
    rescheduling, considering new improvements from [4].
  - TTWL is not based in chunk size, so fix it in LAST_LEVEL
- Minor fixes, removing debugging traces
v1 Link: https://lore.kernel.org/all/20260430111424.3479613-2-leo.bras@arm.com/


[0]: https://lore.kernel.org/all/20260225040421.2683931-1-zhengtian10@huawei.com/
[1]: https://github.com/tianocore/edk2/issues/12409
[2]: dirty_log_test && dirty_log_perf_test
[3]: using this patchset to enable dirty-ring on dirty_log_perf_test:
     https://lore.kernel.org/all/20260629105950.1790259-1-leo.bras@arm.com/
[4]: https://lore.kernel.org/all/20260618131447.764085-1-leo.bras@arm.com/

Leonardo Bras (13):
  KVM: arm64: HDBSS bits
  KVM: arm64: Enable eager hugepage splitting if HDBSS is available
  arm64/cpufeature: Add system-wide FEAT_HACDBS detection
  arm64/sysreg: Add HACDBS consumer and base registers
  KVM: arm64: Detect (via ACPI) and initialize HACDBSIRQ
  KVM: arm64: dirty_bit: Add base FEAT_HACDBS cleaning routine
  kvm: Add arch-generic interface for hw-accelerated dirty-bitmap
    cleaning
  KVM: arm64: Add hardware-accelerated dirty-bitmap cleaning routine
  KVM: arm64: Dirty-bitmap: avoid splitting previously split blocks
  kvm/dirty_ring: Introduce get_memslot and move helpers to header
  kvm/dirty_ring: Add arch-generic interface for hw-accelerated
    dirty-ring cleaning
  KVM: arm64: Add hardware-accelerated dirty-ring cleaning routine
  KVM: arm64: Enable KVM_HW_DIRTY_BIT

 arch/arm64/include/asm/acpi.h          |   3 +
 arch/arm64/include/asm/cpufeature.h    |  10 +
 arch/arm64/include/asm/kvm_dirty_bit.h |  67 ++++
 arch/arm64/include/asm/kvm_pgtable.h   |   3 +
 include/acpi/actbl2.h                  |   1 +
 include/linux/kvm_dirty_bit.h          |  34 ++
 include/linux/kvm_dirty_ring.h         |  12 +
 include/linux/kvm_host.h               |   3 +
 arch/arm64/kernel/cpufeature.c         |  20 ++
 arch/arm64/kvm/arm.c                   |   5 +
 arch/arm64/kvm/dirty_bit.c             | 411 +++++++++++++++++++++++++
 arch/arm64/kvm/hyp/pgtable.c           |  15 +-
 arch/arm64/kvm/mmu.c                   |  12 +-
 virt/kvm/dirty_ring.c                  |  34 +-
 virt/kvm/kvm_main.c                    |  13 +-
 arch/arm64/kvm/Kconfig                 |   1 +
 arch/arm64/kvm/Makefile                |   2 +-
 arch/arm64/tools/cpucaps               |   2 +
 arch/arm64/tools/sysreg                |  30 ++
 virt/kvm/Kconfig                       |   3 +
 20 files changed, 659 insertions(+), 22 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_dirty_bit.h
 create mode 100644 include/linux/kvm_dirty_bit.h
 create mode 100644 arch/arm64/kvm/dirty_bit.c


base-commit: dc59e4fea9d83f03bad6bddf3fa2e52491777482
-- 
2.54.0


^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2026-06-30 19:06 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-29 11:17 [PATCH v2 00/13] KVM Dirty-bit cleaning hw accelerator (HACDBS) Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 01/13] KVM: arm64: HDBSS bits Leonardo Bras
2026-06-29 11:34   ` sashiko-bot
2026-06-29 12:57     ` Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 02/13] KVM: arm64: Enable eager hugepage splitting if HDBSS is available Leonardo Bras
2026-06-29 11:36   ` sashiko-bot
2026-06-29 14:47     ` Leonardo Bras
2026-06-29 17:06       ` Oliver Upton
2026-06-30 12:58         ` Leonardo Bras
2026-06-30 15:44           ` Oliver Upton
2026-06-30 17:09             ` Leonardo Bras
2026-06-30 18:43               ` Oliver Upton
2026-06-29 11:17 ` [PATCH v2 03/13] arm64/cpufeature: Add system-wide FEAT_HACDBS detection Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 04/13] arm64/sysreg: Add HACDBS consumer and base registers Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 05/13] KVM: arm64: Detect (via ACPI) and initialize HACDBSIRQ Leonardo Bras
2026-06-29 11:32   ` sashiko-bot
2026-06-29 15:43     ` Leonardo Bras
2026-06-29 16:52       ` Vladimir Murzin
2026-06-30 14:52         ` Leonardo Bras
2026-06-29 17:22   ` Oliver Upton
2026-06-30 14:50     ` Leonardo Bras
2026-06-30 16:03       ` Oliver Upton
2026-06-30 17:19         ` Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 06/13] KVM: arm64: dirty_bit: Add base FEAT_HACDBS cleaning routine Leonardo Bras
2026-06-29 11:29   ` sashiko-bot
2026-06-29 15:54     ` Leonardo Bras
2026-06-29 17:36   ` Oliver Upton
2026-06-30 14:59     ` Leonardo Bras
2026-06-30 19:06       ` Oliver Upton
2026-06-29 11:17 ` [PATCH v2 07/13] kvm: Add arch-generic interface for hw-accelerated dirty-bitmap cleaning Leonardo Bras
2026-06-29 11:38   ` sashiko-bot
2026-06-29 16:07     ` Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 08/13] KVM: arm64: Add hardware-accelerated dirty-bitmap cleaning routine Leonardo Bras
2026-06-29 11:45   ` sashiko-bot
2026-06-29 16:49     ` Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 09/13] KVM: arm64: Dirty-bitmap: avoid splitting previously split blocks Leonardo Bras
2026-06-29 11:39   ` sashiko-bot
2026-06-29 17:07     ` Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 10/13] kvm/dirty_ring: Introduce get_memslot and move helpers to header Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 11/13] kvm/dirty_ring: Add arch-generic interface for hw-accelerated dirty-ring cleaning Leonardo Bras
2026-06-29 11:49   ` sashiko-bot
2026-06-29 17:09     ` Leonardo Bras
2026-06-29 11:18 ` [PATCH v2 12/13] KVM: arm64: Add hardware-accelerated dirty-ring cleaning routine Leonardo Bras
2026-06-29 11:49   ` sashiko-bot
2026-06-29 17:26     ` Leonardo Bras
2026-06-29 11:18 ` [PATCH v2 13/13] KVM: arm64: Enable KVM_HW_DIRTY_BIT Leonardo Bras
2026-06-29 11:52   ` sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox