Re: [PATCH v2 02/13] KVM: arm64: Enable eager hugepage splitting if HDBSS is available

Kernel KVM virtualization development
 help / color / mirror / Atom feed

From: Leonardo Bras <leo.bras@arm.com>
To: Oliver Upton <oupton@kernel.org>
Cc: Leonardo Bras <leo.bras@arm.com>,
	sashiko-reviews@lists.linux.dev, Marc Zyngier <maz@kernel.org>,
	kvmarm@lists.linux.dev, kvm@vger.kernel.org,
	Wei-Lin Chang <weilin.chang@arm.com>
Subject: Re: [PATCH v2 02/13] KVM: arm64: Enable eager hugepage splitting if HDBSS is available
Date: Tue, 30 Jun 2026 18:09:56 +0100	[thread overview]
Message-ID: <akP4ZDjZIu2_CfVF@LeoBrasDK> (raw)
In-Reply-To: <akPkVASjWWYQpkx1@kernel.org>

On Tue, Jun 30, 2026 at 08:44:20AM -0700, Oliver Upton wrote:
> On Tue, Jun 30, 2026 at 01:58:48PM +0100, Leonardo Bras wrote:
> > On Mon, Jun 29, 2026 at 10:06:38AM -0700, Oliver Upton wrote:
> > > > But this raises a topic I would like to understand:
> > > > - Do we actually need this to be a block_size to assure correctness? or is 
> > > >   it just about efficiency?
> > > 
> > > What value is there in having a chunk size larger than the largest
> > > possible block mapping? The whole UAPI is deliberately tied up with page
> > > table geometry.
> > 
> > Not larger, possibly smallerv My concern was the difference in pages to 
> > split between 4k, 16k and 64k.
> 
> Ok, well in any case the upper bound is going to be the largest possible
> block mapping for a given page granule.
>

Sure, we can do this.

I was worried because that would mean dealing, per granule, with 256k pages 
in PGSIZE 4k, a 4M pages in PGSIZE 16k, an 64M pages in PGSIZE 64k. 
Those are values with different orders of magnitude, and I worried that it 
would take too long, or require too much cache for a single run.

But if you think that's ok, sure then. 


 
> > > 
> > > Overall, I'm not buying the argument for changing the behavior of
> > > KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE. There are very good reasons for
> > > *not* eagerly splitting the entire address space, especially if you know
> > > the working set of the VM is small.
> > > 
> > > You can still use HDBSS without eagerly splitting, so long as block
> > > mappings are {DBM, S2AP_W} = {0, 0} and leaf mappings (which have
> > > a writable PFN) are {1, 0}.
> > > 
> > 
> > Block mappings being read-only, and leaf mappings being writable-clean, 
> > then? Could you please ellaborate on why does not it need eager-split?
> 
> Read-only translations will continue to generate permission faults
> whereas writable-clean descriptors can be updated by hardware. You get
> the opportunity to split a block mapping lazily while preserving
> hardware dirty tracking for page mappings.
> 

So you suggest we only enable DBM bit after we split the block, that will 
happen only after a block is dirtied for the first time after dirty-log 
starts? 

> > As a review, what I recall from the strategy for hw dirty-logging was:
> > - If we have HDBSS, add DBM for all writable pages {1, 1}
> > - On dirty-logging start, make them writable-clean {1, 0}
> >   - Can be done using HACDBS
> >   - Enable HDBSS & HAFDBS
> > 
> > We don't have a fault for making pages dirty anymore, as this is done 
> > by HAFDBS and recorded by HDBSS, so splitting does not happen on demand 
> > anymore. So if we want to split pages, for better tracking granularity, or 
> > anything, we have to eager-split them.
> 
> What I'm saying is the presumption that eager page splitting is always a
> net-win is wrong. Nor is eager page splitting a hard requirement for
> using HDBSS since you can set up the stage-2 in such a way that only
> page granularity mappings are dirtied by hardware.
> 
> You could, in theory, have a workload that is read-heavy for a majority
> of the VM's address space and writes to only a subset of that memory.
> Eagerly splitting pages would likely regress the workload from a higher
> rate of TLB refills / more TLB walk steps.
> 
> Lazily splitting would have the effect of leaving block mappings in
> place for most of the VM. This is exactly why the VMM is in the driver
> seat for deciding whether to lazily or eagerly split the stage-2.
> 

I see your point.

> The approach I think we may need is:
> 
>  - Use a software bit in the PTE to stash whether or not a PFN is
>    'software-writable' when constructing the stage-2. By this I mean
>    we've already faulted it in for write from the primary MMU.
> 
>  - At the time of write protection, reap the hardware-writable state
>    from all PTEs but preserve the software-writable bit.
> 
>  - Whenever splitting a block mapping, set the DBM bit in the page-level
>    PTEs if the block was software-writable and HDBSS is present.
> 
> That way you'd have sufficient metadata in the PTE to safely set DBM.

I remember that, for some reason I can't recall, it would not be great to 
set DBM during dirty-log start, and instead we should have it since VM 
creation. Maybe it had to do with part of the pagetable using the old 
encoding (no DBM), and the other part using the new one.

IIRC, only blocks that are backed by writable memory (S1) were supposed to 
receive the DBM bit. We could use that info for deciding what to split, 
then.

Another option would be to split when we are collecting a dirty-entry from 
HDBSS, but for live migration that would mean we have to transfer the whole 
block (possibly a large LEVEL1 block), because we have no idea which part 
of it got dirty.



> We
> could even make use of that metadata for write faults on non-HDBSS
> hardware to avoid the overheads of user_mem_abort() (e.g. VMA lookup)
> and treat it more like access flag updates.
> 
> The last point still needs some thought.
> 

I don't quite understand this yet. But will take a look on how would that 
work.


Thanks!
Leo

next prev parent reply	other threads:[~2026-06-30 17:10 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-29 11:17 [PATCH v2 00/13] KVM Dirty-bit cleaning hw accelerator (HACDBS) Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 01/13] KVM: arm64: HDBSS bits Leonardo Bras
2026-06-29 11:34   ` sashiko-bot
2026-06-29 12:57     ` Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 02/13] KVM: arm64: Enable eager hugepage splitting if HDBSS is available Leonardo Bras
2026-06-29 11:36   ` sashiko-bot
2026-06-29 14:47     ` Leonardo Bras
2026-06-29 17:06       ` Oliver Upton
2026-06-30 12:58         ` Leonardo Bras
2026-06-30 15:44           ` Oliver Upton
2026-06-30 17:09             ` Leonardo Bras [this message]
2026-06-29 11:17 ` [PATCH v2 03/13] arm64/cpufeature: Add system-wide FEAT_HACDBS detection Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 04/13] arm64/sysreg: Add HACDBS consumer and base registers Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 05/13] KVM: arm64: Detect (via ACPI) and initialize HACDBSIRQ Leonardo Bras
2026-06-29 11:32   ` sashiko-bot
2026-06-29 15:43     ` Leonardo Bras
2026-06-29 16:52       ` Vladimir Murzin
2026-06-30 14:52         ` Leonardo Bras
2026-06-29 17:22   ` Oliver Upton
2026-06-30 14:50     ` Leonardo Bras
2026-06-30 16:03       ` Oliver Upton
2026-06-30 17:19         ` Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 06/13] KVM: arm64: dirty_bit: Add base FEAT_HACDBS cleaning routine Leonardo Bras
2026-06-29 11:29   ` sashiko-bot
2026-06-29 15:54     ` Leonardo Bras
2026-06-29 17:36   ` Oliver Upton
2026-06-30 14:59     ` Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 07/13] kvm: Add arch-generic interface for hw-accelerated dirty-bitmap cleaning Leonardo Bras
2026-06-29 11:38   ` sashiko-bot
2026-06-29 16:07     ` Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 08/13] KVM: arm64: Add hardware-accelerated dirty-bitmap cleaning routine Leonardo Bras
2026-06-29 11:45   ` sashiko-bot
2026-06-29 16:49     ` Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 09/13] KVM: arm64: Dirty-bitmap: avoid splitting previously split blocks Leonardo Bras
2026-06-29 11:39   ` sashiko-bot
2026-06-29 17:07     ` Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 10/13] kvm/dirty_ring: Introduce get_memslot and move helpers to header Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 11/13] kvm/dirty_ring: Add arch-generic interface for hw-accelerated dirty-ring cleaning Leonardo Bras
2026-06-29 11:49   ` sashiko-bot
2026-06-29 17:09     ` Leonardo Bras
2026-06-29 11:18 ` [PATCH v2 12/13] KVM: arm64: Add hardware-accelerated dirty-ring cleaning routine Leonardo Bras
2026-06-29 11:49   ` sashiko-bot
2026-06-29 17:26     ` Leonardo Bras
2026-06-29 11:18 ` [PATCH v2 13/13] KVM: arm64: Enable KVM_HW_DIRTY_BIT Leonardo Bras
2026-06-29 11:52   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=akP4ZDjZIu2_CfVF@LeoBrasDK \
    --to=leo.bras@arm.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=maz@kernel.org \
    --cc=oupton@kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=weilin.chang@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox