From: Sean Christopherson <seanjc@google.com>
To: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: linux-kernel@vger.kernel.org,
Lai Jiangshan <jiangshan.ljs@antgroup.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Thomas Gleixner <tglx@kernel.org>, Ingo Molnar <mingo@redhat.com>,
Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
kvm@vger.kernel.org
Subject: Re: [PATCH 2/2] KVM: x86/mmu: KVM: x86/mmu: Skip unsync when large pages are allowed
Date: Thu, 12 Mar 2026 10:07:57 -0700 [thread overview]
Message-ID: <abLy7cEDz7VlWtWS@google.com> (raw)
In-Reply-To: <20260123090304.32286-2-jiangshanlai@gmail.com>
On Fri, Jan 23, 2026, Lai Jiangshan wrote:
> From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
>
> Use the large-page metadata to avoid pointless attempts to search SP.
>
> If the target GFN falls within a range where a large page is allowed,
> then there cannot be a shadow page for that GFN; a shadow page in the
> range would itself disallow using a large page. In that case, there
> is nothing to unsync and mmu_try_to_unsync_pages() can return
> immediately.
>
> This is always true for TDP MMU without nested TDP,
I wouldn't expect this to be a much of a performance optimization for this case
though, as kvm_get_mmu_page_hash() will return an empty list, i.e.
for_each_gfn_valid_sp_with_gptes() won't do meaningful work anyways.
> and holds for a significant fraction of cases with shadow paging even all SPs
> are 4K.
>
> For shadow paging, this optimization theoretically avoids work for about
> 1/e ~= 37% of GFNs, assuming one guest page table per 2M of memory and
> that each GPT falls randomly into the 2M memory buckets. In a simple
> test setup, it skipped unsync in a much higher percentage of cases,
> mainly because the guest buddy allocator clusters GPTs into fewer
> buckets.
>
> Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> ---
> arch/x86/kvm/mmu/mmu.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 4535d2836004..555075fb63d9 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2932,6 +2932,14 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
> struct kvm_mmu_page *sp;
> bool locked = false;
>
> + /*
> + * If large page is allowed, there is no shadow page in the GFN range,
> + * because the presence of a shadow page in that range would prevent
> + * using a large page.
> + */
> + if (!lpage_info_slot(gfn, slot, PG_LEVEL_2M)->disallow_lpage)
> + return 0;
Hmm, I'd like to move this to after the write-tracking check, even though as
implemented in code today, the two are mutually exclusive. Specifically, I don't
want to rely on KVM not supporting write-tracking at 2MiB granularity, and also
to avoid confusing readers. E.g. a shallow read of account_shadowed() would lead
people to believe this code is wrong:
/* the non-leaf shadow pages are keeping readonly. */
if (sp->role.level > PG_LEVEL_4K)
return __kvm_write_track_add_gfn(kvm, slot, gfn);
kvm_mmu_gfn_disallow_lpage(slot, gfn);
if they didn't follow __kvm_write_track_add_gfn() to see:
/*
* new track stops large page mapping for the
* tracked page.
*/
kvm_mmu_gfn_disallow_lpage(slot, gfn);
From a performance perspective, kvm_gfn_is_write_tracked() is O(1) time, and
should be very fast for the "pure" TDP MMU case, so I don't think that's a
concern.
This is what I have locally, please holler if you object to landing the code
after the write-tracked check.
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 363967a17069..3d0e0c1b5332 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2940,6 +2940,15 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
if (kvm_gfn_is_write_tracked(kvm, slot, gfn))
return -EPERM;
+ /*
+ * Only 4KiB mappings can become unsync, and KVM disallows hugepages
+ * for unsync gfns. Upper-level gPTEs (leaf or non-leaf) are always
+ * write-protected (see above), thus if the gfn can be mapped with a
+ * hugepage and isn't write-tracked, it can't be unsync.
+ */
+ if (!lpage_info_slot(gfn, slot, PG_LEVEL_2M)->disallow_lpage)
+ return 0;
+
/*
* The page is not write-tracked, mark existing shadow pages unsync
* unless KVM is synchronizing an unsync SP. In that case, KVM must
> /*
> * Force write-protection if the page is being tracked. Note, the page
> * track machinery is used to write-protect upper-level shadow pages,
> --
> 2.19.1.6.gb485710b
>
next prev parent reply other threads:[~2026-03-12 17:07 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-23 9:03 [PATCH 1/2] KVM: x86/mmu: Don't check old SPTE permissions when trying to unsync Lai Jiangshan
2026-01-23 9:03 ` [PATCH 2/2] KVM: x86/mmu: KVM: x86/mmu: Skip unsync when large pages are allowed Lai Jiangshan
2026-03-12 17:07 ` Sean Christopherson [this message]
2026-03-12 17:22 ` Sean Christopherson
2026-03-12 17:35 ` [PATCH 1/2] KVM: x86/mmu: Don't check old SPTE permissions when trying to unsync Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=abLy7cEDz7VlWtWS@google.com \
--to=seanjc@google.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=jiangshan.ljs@antgroup.com \
--cc=jiangshanlai@gmail.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=tglx@kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox