Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Leonardo Bras <leo.bras@arm.com>
To: Will Deacon <will@kernel.org>
Cc: Leonardo Bras <leo.bras@arm.com>,
	Oliver Upton <oupton@kernel.org>, Marc Zyngier <maz@kernel.org>,
	Joey Gouly <joey.gouly@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Zenghui Yu <yuzenghui@huawei.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Fuad Tabba <tabba@google.com>,
	Raghavendra Rao Ananta <rananta@google.com>,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 1/2] KVM: arm64: Introduce S2 walker SKIP return options
Date: Tue, 19 May 2026 13:56:48 +0100	[thread overview]
Message-ID: <agxeEA14cP3yMV1m@devkitleo> (raw)
In-Reply-To: <agxa-TydN1PoKsYn@willie-the-truck>

On Tue, May 19, 2026 at 01:43:37PM +0100, Will Deacon wrote:
> On Mon, May 18, 2026 at 02:45:59PM +0100, Leonardo Bras wrote:
> > Hello Oliver, Will,
> > Thanks for reviewing!
> > 
> > On Mon, May 18, 2026 at 09:52:16AM +0100, Will Deacon wrote:
> > > On Mon, May 18, 2026 at 12:22:47AM -0700, Oliver Upton wrote:
> > > > On Fri, May 15, 2026 at 08:59:02PM +0100, Leonardo Bras wrote:
> > > > > Introduce S2 walker return values:
> > > > > - SKIP_CHILDREN: skip walking the children of the current node
> > > > > - SKIP_SIBLINGS: skip waling the siblings of the current node
> > > > > 
> > > > > Also, modify __kvm_pgtable_visit() to fulfil the hing on above return
> > > > > values. Current walkers should not be impacted
> > > > 
> > > > I'd rather see something based around new walk flags than introducing an
> > > > entirely new mechanic around return values.
> > > > 
> > > > e.g. you could split the LEAF flag into separate flags for blocks v.
> > > > pages:
> > > > 
> > > > 	KVM_PGTABLE_WALK_PAGE,
> > > > 	KVM_PGTABLE_WALK_BLOCK,
> > > > 	KVM_PGTABLE_WALK_LEAF	= KVM_PGTABLE_WALK_PAGE |
> > > > 				  KVM_PGTABLE_WALK_BLOCK,
> > > > 
> > > > and then let __kvm_pgtable_visit() decide how to steer the walk. You may
> > > > need some special handling to get the address arithmetic right when
> > > > skipping over a table of page descriptors.
> > 
> > I am probably not getting the whole inner workings of this solution, but 
> > IIUC the idea would be to walk the blocks, but not the pages, right?
> > 
> > Blocks meaning level2- and pages being level3?
> >  
> > > I was wondering along similar lines, but maybe it would be useful just
> > > to pass a maximum level to the walker logic? That feels like the most
> > > general case without complicating the existing logic.
> > 
> > This proposal seems simpler for me to understand, and indeed looks like a 
> > better solution than what I have proposed, taking care of  the 
> > 'already split' case with better performance, as it don't even walk a 
> > single level-3 entry. 
> > 
> > On the 'splitting' case, it also works flawlessly if the memory is given in 
> > level-2 blocks. There is only one case that I would like to address here:
> > 
> > - Memory given in level-1 blocks (say 1GB)
> > - Walker flag says 'walk down to level-2 only'
> > - Split Walker on level-1 will break page down to (up to) level-3 entries.
> > - Walker will continue to be called on level-2 entries, even though it's 
> >   not necessary.
> 
> If you're only visiting leaves, why would it be called on the level-2
> table entries?
> 

Because once the leaf is turned into a table by the splitting walker, it 
gets reloaded and walked. This is an excerpt of __kvm_pgtable_visit():

---

if (!table && (ctx.flags & KVM_PGTABLE_WALK_LEAF)) {
         ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_LEAF);
         reload = true;
 }

 /*
  * Reload the page table after invoking the walker callback for leaf
  * entries or after pre-order traversal, to allow the walker to descend
  * into a newly installed or replaced table.
  */
 if (reload) {
         ctx.old = READ_ONCE(*ptep);
         table = kvm_pte_table(ctx.old, level);
 }

 if (!kvm_pgtable_walk_continue(data->walker, ret))
         goto out;

 if (!table) {
         data->addr = ALIGN_DOWN(data->addr, kvm_granule_size(level));
         data->addr += kvm_granule_size(level);
         goto out;
 }

 childp = (kvm_pteref_t)kvm_pte_follow(ctx.old, mm_ops);
 ret = __kvm_pgtable_walk(data, mm_ops, childp, level + 1);
 if (!kvm_pgtable_walk_continue(data->walker, ret))
         goto out;

---

After the leaf is visited, it makes reload=true, which causes the newly 
created table to be detected as such and waked down. It means even the page 
that just got split will be walked down to the specified level.

Example:
- Split this level-1 leave:
  - Walker creates the whole structure up to given level (currently 3)
  - Walker returns, gets reloaded, table detected, go down on that one
  - Level 2 entries walked (which is unnecessary)

Please let me know if I am misunderstanding something.

Thanks!
Leo


  reply	other threads:[~2026-05-19 12:57 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-15 19:59 [RFC PATCH 0/2] Optimize S2 page splitting Leonardo Bras
2026-05-15 19:59 ` [RFC PATCH 1/2] KVM: arm64: Introduce S2 walker SKIP return options Leonardo Bras
2026-05-18  7:22   ` Oliver Upton
2026-05-18  8:52     ` Will Deacon
2026-05-18 13:45       ` Leonardo Bras
2026-05-19 12:43         ` Will Deacon
2026-05-19 12:56           ` Leonardo Bras [this message]
2026-05-19 13:15             ` Will Deacon
2026-05-19 14:35               ` Leonardo Bras
2026-05-19 21:21                 ` Oliver Upton
2026-05-15 19:59 ` [RFC PATCH 2/2] KVM: arm64: Improve splitting performance by using SKIP return values Leonardo Bras
2026-05-16  9:15 ` [RFC PATCH 0/2] Optimize S2 page splitting Marc Zyngier
2026-05-18 14:09   ` Leonardo Bras

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=agxeEA14cP3yMV1m@devkitleo \
    --to=leo.bras@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=joey.gouly@arm.com \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=oupton@kernel.org \
    --cc=rananta@google.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tabba@google.com \
    --cc=will@kernel.org \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox