Re: [PATCH v2 02/13] KVM: arm64: Enable eager hugepage splitting if HDBSS is available

Kernel KVM virtualization development
 help / color / mirror / Atom feed

From: Oliver Upton <oupton@kernel.org>
To: Leonardo Bras <leo.bras@arm.com>
Cc: sashiko-reviews@lists.linux.dev, Marc Zyngier <maz@kernel.org>,
	kvmarm@lists.linux.dev, kvm@vger.kernel.org,
	Wei-Lin Chang <weilin.chang@arm.com>
Subject: Re: [PATCH v2 02/13] KVM: arm64: Enable eager hugepage splitting if HDBSS is available
Date: Tue, 30 Jun 2026 08:44:20 -0700	[thread overview]
Message-ID: <akPkVASjWWYQpkx1@kernel.org> (raw)
In-Reply-To: <akO9iHJmKN7MzTjM@LeoBrasDK>

On Tue, Jun 30, 2026 at 01:58:48PM +0100, Leonardo Bras wrote:
> On Mon, Jun 29, 2026 at 10:06:38AM -0700, Oliver Upton wrote:
> > > But this raises a topic I would like to understand:
> > > - Do we actually need this to be a block_size to assure correctness? or is 
> > >   it just about efficiency?
> > 
> > What value is there in having a chunk size larger than the largest
> > possible block mapping? The whole UAPI is deliberately tied up with page
> > table geometry.
> 
> Not larger, possibly smallerv My concern was the difference in pages to 
> split between 4k, 16k and 64k.

Ok, well in any case the upper bound is going to be the largest possible
block mapping for a given page granule.

> > 
> > Overall, I'm not buying the argument for changing the behavior of
> > KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE. There are very good reasons for
> > *not* eagerly splitting the entire address space, especially if you know
> > the working set of the VM is small.
> > 
> > You can still use HDBSS without eagerly splitting, so long as block
> > mappings are {DBM, S2AP_W} = {0, 0} and leaf mappings (which have
> > a writable PFN) are {1, 0}.
> > 
> 
> Block mappings being read-only, and leaf mappings being writable-clean, 
> then? Could you please ellaborate on why does not it need eager-split?

Read-only translations will continue to generate permission faults
whereas writable-clean descriptors can be updated by hardware. You get
the opportunity to split a block mapping lazily while preserving
hardware dirty tracking for page mappings.

> As a review, what I recall from the strategy for hw dirty-logging was:
> - If we have HDBSS, add DBM for all writable pages {1, 1}
> - On dirty-logging start, make them writable-clean {1, 0}
>   - Can be done using HACDBS
>   - Enable HDBSS & HAFDBS
> 
> We don't have a fault for making pages dirty anymore, as this is done 
> by HAFDBS and recorded by HDBSS, so splitting does not happen on demand 
> anymore. So if we want to split pages, for better tracking granularity, or 
> anything, we have to eager-split them.

What I'm saying is the presumption that eager page splitting is always a
net-win is wrong. Nor is eager page splitting a hard requirement for
using HDBSS since you can set up the stage-2 in such a way that only
page granularity mappings are dirtied by hardware.

You could, in theory, have a workload that is read-heavy for a majority
of the VM's address space and writes to only a subset of that memory.
Eagerly splitting pages would likely regress the workload from a higher
rate of TLB refills / more TLB walk steps.

Lazily splitting would have the effect of leaving block mappings in
place for most of the VM. This is exactly why the VMM is in the driver
seat for deciding whether to lazily or eagerly split the stage-2.

The approach I think we may need is:

 - Use a software bit in the PTE to stash whether or not a PFN is
   'software-writable' when constructing the stage-2. By this I mean
   we've already faulted it in for write from the primary MMU.

 - At the time of write protection, reap the hardware-writable state
   from all PTEs but preserve the software-writable bit.

 - Whenever splitting a block mapping, set the DBM bit in the page-level
   PTEs if the block was software-writable and HDBSS is present.

That way you'd have sufficient metadata in the PTE to safely set DBM. We
could even make use of that metadata for write faults on non-HDBSS
hardware to avoid the overheads of user_mem_abort() (e.g. VMA lookup)
and treat it more like access flag updates.

The last point still needs some thought.

Thanks,
Oliver

next prev parent reply	other threads:[~2026-06-30 15:44 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-29 11:17 [PATCH v2 00/13] KVM Dirty-bit cleaning hw accelerator (HACDBS) Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 01/13] KVM: arm64: HDBSS bits Leonardo Bras
2026-06-29 11:34   ` sashiko-bot
2026-06-29 12:57     ` Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 02/13] KVM: arm64: Enable eager hugepage splitting if HDBSS is available Leonardo Bras
2026-06-29 11:36   ` sashiko-bot
2026-06-29 14:47     ` Leonardo Bras
2026-06-29 17:06       ` Oliver Upton
2026-06-30 12:58         ` Leonardo Bras
2026-06-30 15:44           ` Oliver Upton [this message]
2026-06-30 17:09             ` Leonardo Bras
2026-06-30 18:43               ` Oliver Upton
2026-06-29 11:17 ` [PATCH v2 03/13] arm64/cpufeature: Add system-wide FEAT_HACDBS detection Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 04/13] arm64/sysreg: Add HACDBS consumer and base registers Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 05/13] KVM: arm64: Detect (via ACPI) and initialize HACDBSIRQ Leonardo Bras
2026-06-29 11:32   ` sashiko-bot
2026-06-29 15:43     ` Leonardo Bras
2026-06-29 16:52       ` Vladimir Murzin
2026-06-30 14:52         ` Leonardo Bras
2026-06-29 17:22   ` Oliver Upton
2026-06-30 14:50     ` Leonardo Bras
2026-06-30 16:03       ` Oliver Upton
2026-06-30 17:19         ` Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 06/13] KVM: arm64: dirty_bit: Add base FEAT_HACDBS cleaning routine Leonardo Bras
2026-06-29 11:29   ` sashiko-bot
2026-06-29 15:54     ` Leonardo Bras
2026-06-29 17:36   ` Oliver Upton
2026-06-30 14:59     ` Leonardo Bras
2026-06-30 19:06       ` Oliver Upton
2026-06-29 11:17 ` [PATCH v2 07/13] kvm: Add arch-generic interface for hw-accelerated dirty-bitmap cleaning Leonardo Bras
2026-06-29 11:38   ` sashiko-bot
2026-06-29 16:07     ` Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 08/13] KVM: arm64: Add hardware-accelerated dirty-bitmap cleaning routine Leonardo Bras
2026-06-29 11:45   ` sashiko-bot
2026-06-29 16:49     ` Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 09/13] KVM: arm64: Dirty-bitmap: avoid splitting previously split blocks Leonardo Bras
2026-06-29 11:39   ` sashiko-bot
2026-06-29 17:07     ` Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 10/13] kvm/dirty_ring: Introduce get_memslot and move helpers to header Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 11/13] kvm/dirty_ring: Add arch-generic interface for hw-accelerated dirty-ring cleaning Leonardo Bras
2026-06-29 11:49   ` sashiko-bot
2026-06-29 17:09     ` Leonardo Bras
2026-06-29 11:18 ` [PATCH v2 12/13] KVM: arm64: Add hardware-accelerated dirty-ring cleaning routine Leonardo Bras
2026-06-29 11:49   ` sashiko-bot
2026-06-29 17:26     ` Leonardo Bras
2026-06-29 11:18 ` [PATCH v2 13/13] KVM: arm64: Enable KVM_HW_DIRTY_BIT Leonardo Bras
2026-06-29 11:52   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=akPkVASjWWYQpkx1@kernel.org \
    --to=oupton@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=leo.bras@arm.com \
    --cc=maz@kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=weilin.chang@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox