* [PATCH] KVM: x86: use again the flush argument of __link_shadow_page() @ 2026-05-03 21:09 Paolo Bonzini 2026-05-04 17:27 ` Sean Christopherson 0 siblings, 1 reply; 5+ messages in thread From: Paolo Bonzini @ 2026-05-03 21:09 UTC (permalink / raw) To: linux-kernel, kvm; +Cc: seanjc Except in the case of parentless nested-TDP pages, mmu_page_zap_pte() clears the SPTE but leaves the invalid_list empty. In this case, using kvm_flush_remote_tlbs() as kvm_mmu_remote_flush_or_zap() does is overkill. Avoid flushing the entirety of the remote TLBs unless the invalid_list was populated: instead, use a more efficient gfn-targeting flush (if available) and skip it altogether if the caller guarantees that a TLB flush is not necessary. Based-on: <20260503201029.106481-1-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- arch/x86/kvm/mmu/mmu.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 892246204435..85bec8eeace8 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -2541,8 +2541,10 @@ static void __link_shadow_page(struct kvm *kvm, parent_sp = sptep_to_sp(sptep); WARN_ON_ONCE(parent_sp->role.level == PG_LEVEL_4K); - mmu_page_zap_pte(kvm, parent_sp, sptep, &invalid_list); - kvm_mmu_remote_flush_or_zap(kvm, &invalid_list, true); + if (mmu_page_zap_pte(kvm, parent_sp, sptep, &invalid_list)) + kvm_mmu_commit_zap_page(kvm, &invalid_list); + else if (flush) + kvm_flush_remote_tlbs_sptep(kvm, sptep); } spte = make_nonleaf_spte(sp->spt, sp_ad_disabled(sp)); -- 2.54.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] KVM: x86: use again the flush argument of __link_shadow_page() 2026-05-03 21:09 [PATCH] KVM: x86: use again the flush argument of __link_shadow_page() Paolo Bonzini @ 2026-05-04 17:27 ` Sean Christopherson 2026-05-04 18:36 ` Sean Christopherson 0 siblings, 1 reply; 5+ messages in thread From: Sean Christopherson @ 2026-05-04 17:27 UTC (permalink / raw) To: Paolo Bonzini; +Cc: linux-kernel, kvm x86/mmu On Sun, May 03, 2026, Paolo Bonzini wrote: > Except in the case of parentless nested-TDP pages, mmu_page_zap_pte() > clears the SPTE but leaves the invalid_list empty. In this case, using > kvm_flush_remote_tlbs() as kvm_mmu_remote_flush_or_zap() does is overkill. > Avoid flushing the entirety of the remote TLBs unless the invalid_list > was populated: instead, use a more efficient gfn-targeting flush (if > available) and skip it altogether if the caller guarantees that a TLB > flush is not necessary. > > Based-on: <20260503201029.106481-1-pbonzini@redhat.com> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > arch/x86/kvm/mmu/mmu.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 892246204435..85bec8eeace8 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -2541,8 +2541,10 @@ static void __link_shadow_page(struct kvm *kvm, > parent_sp = sptep_to_sp(sptep); > WARN_ON_ONCE(parent_sp->role.level == PG_LEVEL_4K); > > - mmu_page_zap_pte(kvm, parent_sp, sptep, &invalid_list); > - kvm_mmu_remote_flush_or_zap(kvm, &invalid_list, true); > + if (mmu_page_zap_pte(kvm, parent_sp, sptep, &invalid_list)) > + kvm_mmu_commit_zap_page(kvm, &invalid_list); > + else if (flush) > + kvm_flush_remote_tlbs_sptep(kvm, sptep); Duh, this is obvious in hindsight. Reviewed-by: Sean Christopherson <seanjc@google.com> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] KVM: x86: use again the flush argument of __link_shadow_page() 2026-05-04 17:27 ` Sean Christopherson @ 2026-05-04 18:36 ` Sean Christopherson 2026-05-05 11:52 ` Paolo Bonzini 0 siblings, 1 reply; 5+ messages in thread From: Sean Christopherson @ 2026-05-04 18:36 UTC (permalink / raw) To: Paolo Bonzini; +Cc: linux-kernel, kvm On Mon, May 04, 2026, Sean Christopherson wrote: > x86/mmu > > On Sun, May 03, 2026, Paolo Bonzini wrote: > > Except in the case of parentless nested-TDP pages, mmu_page_zap_pte() > > clears the SPTE but leaves the invalid_list empty. In this case, using > > kvm_flush_remote_tlbs() as kvm_mmu_remote_flush_or_zap() does is overkill. > > Avoid flushing the entirety of the remote TLBs unless the invalid_list > > was populated: instead, use a more efficient gfn-targeting flush (if > > available) and skip it altogether if the caller guarantees that a TLB > > flush is not necessary. > > > > Based-on: <20260503201029.106481-1-pbonzini@redhat.com> > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > > --- > > arch/x86/kvm/mmu/mmu.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index 892246204435..85bec8eeace8 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -2541,8 +2541,10 @@ static void __link_shadow_page(struct kvm *kvm, > > parent_sp = sptep_to_sp(sptep); > > WARN_ON_ONCE(parent_sp->role.level == PG_LEVEL_4K); > > > > - mmu_page_zap_pte(kvm, parent_sp, sptep, &invalid_list); > > - kvm_mmu_remote_flush_or_zap(kvm, &invalid_list, true); > > + if (mmu_page_zap_pte(kvm, parent_sp, sptep, &invalid_list)) > > + kvm_mmu_commit_zap_page(kvm, &invalid_list); > > + else if (flush) > > + kvm_flush_remote_tlbs_sptep(kvm, sptep); > > Duh, this is obvious in hindsight. > > Reviewed-by: Sean Christopherson <seanjc@google.com> An amendment to that: I thought this was just switching back to the more targeted range-based flushed, I didn't realize you applied the version that hardcoded the @flush param to kvm_mmu_remote_flush_or_zap() with "true". If you take this through kvm.git directly, can you add this comment? /* * Note! @flush from the caller doesn't follow KVM's standard * "collect TLB flushes in a variable to batch them" pattern. * In this case, @flush is used to communicate whether or not a * TLB flush is needed *now*, and specifically only impacts the * case where a huge SPTE is replaced with a shadow page SPTE * (KVM always flushes if a shadow page SPTE is zapped). * * When splitting a hugepage and the new shadow page is fully * populated, i.e. every child SPTE is shadow-present and thus * the total mappings are functionally identical, KVM can defer * the TLB flush (until the ioctl completes) as no memory has * been unmapped, and all mappings are still reachable, e.g. so * that future mmu_notifier invalidations are guaranteed to * flush the affected range if relevant mappings are zapped. */ If you're expecting me to grab this, I'll add the comment when applying. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] KVM: x86: use again the flush argument of __link_shadow_page() 2026-05-04 18:36 ` Sean Christopherson @ 2026-05-05 11:52 ` Paolo Bonzini 2026-05-07 19:13 ` Sean Christopherson 0 siblings, 1 reply; 5+ messages in thread From: Paolo Bonzini @ 2026-05-05 11:52 UTC (permalink / raw) To: Sean Christopherson; +Cc: linux-kernel, kvm On Mon, May 4, 2026 at 8:36 PM Sean Christopherson <seanjc@google.com> wrote: > > On Mon, May 04, 2026, Sean Christopherson wrote: > > x86/mmu > > > > On Sun, May 03, 2026, Paolo Bonzini wrote: > > > Except in the case of parentless nested-TDP pages, mmu_page_zap_pte() > > > clears the SPTE but leaves the invalid_list empty. In this case, using > > > kvm_flush_remote_tlbs() as kvm_mmu_remote_flush_or_zap() does is overkill. > > > Avoid flushing the entirety of the remote TLBs unless the invalid_list > > > was populated: instead, use a more efficient gfn-targeting flush (if > > > available) and skip it altogether if the caller guarantees that a TLB > > > flush is not necessary. > > > > > > Based-on: <20260503201029.106481-1-pbonzini@redhat.com> > > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > > > --- > > > arch/x86/kvm/mmu/mmu.c | 6 ++++-- > > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > > index 892246204435..85bec8eeace8 100644 > > > --- a/arch/x86/kvm/mmu/mmu.c > > > +++ b/arch/x86/kvm/mmu/mmu.c > > > @@ -2541,8 +2541,10 @@ static void __link_shadow_page(struct kvm *kvm, > > > parent_sp = sptep_to_sp(sptep); > > > WARN_ON_ONCE(parent_sp->role.level == PG_LEVEL_4K); > > > > > > - mmu_page_zap_pte(kvm, parent_sp, sptep, &invalid_list); > > > - kvm_mmu_remote_flush_or_zap(kvm, &invalid_list, true); > > > + if (mmu_page_zap_pte(kvm, parent_sp, sptep, &invalid_list)) > > > + kvm_mmu_commit_zap_page(kvm, &invalid_list); > > > + else if (flush) > > > + kvm_flush_remote_tlbs_sptep(kvm, sptep); > > > > Duh, this is obvious in hindsight. > > > > Reviewed-by: Sean Christopherson <seanjc@google.com> > > An amendment to that: I thought this was just switching back to the more targeted > range-based flushed, I didn't realize you applied the version that hardcoded the > @flush param to kvm_mmu_remote_flush_or_zap() with "true". Yes, I was hoping to simplify stable backports a bit. Here is my version of your comment (yes, I did try adding batched zapping to shadow_mmu_split_huge_page() and failed): /* * Note: while normally KVM uses a "bool flush" return value to let * the caller batch flushes, __link_shadow_page() flushes immediately * immediately before populating the parent PTE with the new shadow page. * The typical callers, direct_map() and FNAME(fetch)(), are not going * to zap more than one large SPTE anyway. * * The only exception, where @flush can be false, is when a large SPTE * is replaced with a large SPTE with a fully populated page table, * which can happen from shadow_mmu_split_huge_page(). In this case, * no memory is unmapped across the change to the page tables and no * immediate flush is needed for correctness. * * Even in that case, calls to kvm_mmu_commit_zap_page() are not * batched. Doing so would require adding an invalid_list argument * all the way down to __walk_slot_rmaps(). */ What do you think? Paolo ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] KVM: x86: use again the flush argument of __link_shadow_page() 2026-05-05 11:52 ` Paolo Bonzini @ 2026-05-07 19:13 ` Sean Christopherson 0 siblings, 0 replies; 5+ messages in thread From: Sean Christopherson @ 2026-05-07 19:13 UTC (permalink / raw) To: Paolo Bonzini; +Cc: linux-kernel, kvm On Tue, May 05, 2026, Paolo Bonzini wrote: > On Mon, May 4, 2026 at 8:36 PM Sean Christopherson <seanjc@google.com> wrote: > > An amendment to that: I thought this was just switching back to the more targeted > > range-based flushed, I didn't realize you applied the version that hardcoded the > > @flush param to kvm_mmu_remote_flush_or_zap() with "true". > > Yes, I was hoping to simplify stable backports a bit. Here is my > version of your comment (yes, I did try adding batched zapping to > shadow_mmu_split_huge_page() and failed): > > /* > * Note: while normally KVM uses a "bool flush" return value to let > * the caller batch flushes, __link_shadow_page() flushes immediately > * immediately before populating the parent PTE with the new shadow page. > * The typical callers, direct_map() and FNAME(fetch)(), are not going > * to zap more than one large SPTE anyway. > * > * The only exception, where @flush can be false, is when a large SPTE Can we use "huge" instead of "large"? KVM has lots of references to both, but the kernel generally uses "huge", so I've been trying to opportunistically switch to "huge". > * is replaced with a large SPTE with a fully populated page table, I think you want "a shadow page SPTE" here? > * which can happen from shadow_mmu_split_huge_page(). In this case, > * no memory is unmapped across the change to the page tables and no > * immediate flush is needed for correctness. > * > * Even in that case, calls to kvm_mmu_commit_zap_page() are not > * batched. Doing so would require adding an invalid_list argument > * all the way down to __walk_slot_rmaps(). > */ > > What do you think? Other than the nit and the goof, LGTM. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-05-07 19:13 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-03 21:09 [PATCH] KVM: x86: use again the flush argument of __link_shadow_page() Paolo Bonzini 2026-05-04 17:27 ` Sean Christopherson 2026-05-04 18:36 ` Sean Christopherson 2026-05-05 11:52 ` Paolo Bonzini 2026-05-07 19:13 ` Sean Christopherson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox