From: Sean Christopherson <seanjc@google.com>
To: Fred Griffoul <griffoul@gmail.com>
Cc: kvm@vger.kernel.org, pbonzini@redhat.com, vkuznets@redhat.com,
shuah@kernel.org, dwmw@amazon.co.uk,
linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org,
Fred Griffoul <fgriffo@amazon.co.uk>
Subject: Re: [PATCH v4 04/10] KVM: nVMX: Implement cache for L1 APIC pages
Date: Mon, 11 May 2026 16:35:06 -0700 [thread overview]
Message-ID: <agJnqh7ACEkB-ftx@google.com> (raw)
In-Reply-To: <20260102142429.896101-5-griffoul@gmail.com>
On Fri, Jan 02, 2026, Fred Griffoul wrote:
> From: Fred Griffoul <fgriffo@amazon.co.uk>
>
> Replace kvm_host_map usage with gfn_to_pfn_cache for L1 APIC
> virtualization pages (APIC access, virtual APIC, and posted interrupt
> descriptor pages) to improve performance with unmanaged guest memory.
>
> The conversion involves several changes:
>
> - Page loading in nested_get_vmcs12_pages(): load vmcs02 fields with
> pfncache PFNs after each cache has been checked and possibly activated
> or refreshed, during OUTSIDE_GUEST_MODE vCPU mode.
>
> - Invalidation window handling: since nested_get_vmcs12_pages() runs in
> OUTSIDE_GUEST_MODE, there's a window where caches can be invalidated
> by MMU notifications before entering IN_GUEST_MODE. implement
> is_nested_state_invalid() callback to monitor cache validity between
> OUTSIDE_GUEST_MODE and IN_GUEST_MODE transitions. This triggers
> KVM_REQ_GET_NESTED_STATE_PAGES when needed.
>
> - Cache access in event callbacks: the virtual APIC and posted interrupt
> descriptor pages are accessed by KVM in has_events() and
> check_events() nested_ops callbacks. These use the kernel HVA following
> the pfncache pattern of check/refresh, with both callbacks able to sleep
> if cache refresh is required.
>
> This eliminates expensive memremap/memunmap cycles for each L2 VM
> entry/exit, providing substantial performance improvements when using
> unmanaged memory.
>
> Signed-off-by: Fred Griffoul <fgriffo@amazon.co.uk>
> ---
> arch/x86/kvm/vmx/nested.c | 182 +++++++++++++++++++++++++++++---------
> arch/x86/kvm/vmx/vmx.h | 8 +-
> include/linux/kvm_host.h | 5 ++
> 3 files changed, 150 insertions(+), 45 deletions(-)
Please split this up; one "cache" per patch. It's a bit gratutious, but the usage
of the vapic and pid structures is just complex enough to make this more than a
straightforward, "boring" conversion.
> @@ -3112,8 +3173,9 @@ static int nested_vmx_check_controls(struct kvm_vcpu *vcpu,
> static int nested_vmx_check_controls_late(struct kvm_vcpu *vcpu,
> struct vmcs12 *vmcs12)
> {
> - void *vapic = to_vmx(vcpu)->nested.virtual_apic_map.hva;
> - u32 vtpr = vapic ? (*(u32 *)(vapic + APIC_TASKPRI)) >> 4 : 0;
> + struct vcpu_vmx *vmx = to_vmx(vcpu);
> + void *vapic;
> + u32 vtpr = 0;
>
> /*
> * Don't bother with the consistency checks if KVM isn't configured to
> @@ -3130,6 +3192,12 @@ static int nested_vmx_check_controls_late(struct kvm_vcpu *vcpu,
> if (!warn_on_missed_cc)
> return 0;
>
> + vapic = nested_lock_vapic(vmx);
> + if (vapic) {
> + vtpr = (*(u32 *)(vapic + APIC_TASKPRI)) >> 4;
> + nested_unlock_vapic(vmx);
This is *very* dangerous. vapic will still be a legal kernel pointer, and the
code _relies_ on it to be non-NULL, but the pointer could become stale at any time.
Happily, my CLASS() approach makes this a non-issue.
> + }
> +
> if ((exec_controls_get(to_vmx(vcpu)) & CPU_BASED_TPR_SHADOW) &&
> nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW) &&
> !nested_cpu_has_vid(vmcs12) &&
...
> @@ -3543,7 +3609,16 @@ static bool vmx_get_nested_state_pages(struct kvm_vcpu *vcpu)
>
> static bool vmx_is_nested_state_invalid(struct kvm_vcpu *vcpu)
> {
> - return false;
> + struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> + /*
> + * @vcpu is in IN_GUEST_MODE, eliminating the need for individual gpc
This is straight up wrong.
gpc->active is effectively protected by vcpu->mutex.
And for gpc->valid, it's not completely wrong, but it's partially wrong and
definitely misleading. It too is protected by vcpu->mutex, but only for %false
=> %true transitions. And even for %true => %false transitions, IN_GUEST_MODE
doesn't exactly provide safety, rather it ensures that KVM will to kick the vCPU
out of the guest before completing the invalidation.
But that's all moot, because the very existence of this code feels wrong. KVM
should be sending a "no action" request and then requiring vendor code to check
out-of-band data to proxy that into a "real" request.
Presumably older code that you're reverting used KVM_REQ_OUTSIDE_GUEST_MODE;
that doesn't necessary mean it was a good idea, it got yanked out for a reason :-)
IMO, the least awful way to deal with this is to tag KVM_REQ_GET_NESTED_STATE_PAGES
with KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP, and then pass the specific request
that needs to be made when invalidating a vCPU-mapped gpc as part of
__kvm_gpc_init() (though maybe call it vcpu_gpc_init() instead?). That way KVM
doesn't need this wonky request => kvm_gpc_invalid() => request proxying, and we
should be avoid to entirely avoid the confusingly-named kvm_gpc_invalid().
next prev parent reply other threads:[~2026-05-11 23:35 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-02 14:24 [PATCH v4 00/10] KVM: nVMX: Improve performance for unmanaged guest memory Fred Griffoul
2026-01-02 14:24 ` [PATCH v4 01/10] KVM: nVMX: Implement cache for L1 MSR bitmap Fred Griffoul
2026-05-11 23:08 ` Sean Christopherson
2026-01-02 14:24 ` [PATCH v4 02/10] KVM: pfncache: Restore guest-uses-pfn support Fred Griffoul
2026-01-02 14:24 ` [PATCH v4 03/10] KVM: x86: Add nested state validation for pfncache support Fred Griffoul
2026-01-02 14:24 ` [PATCH v4 04/10] KVM: nVMX: Implement cache for L1 APIC pages Fred Griffoul
2026-05-11 23:35 ` Sean Christopherson [this message]
2026-01-02 14:24 ` [PATCH v4 05/10] KVM: selftests: Add nested VMX APIC cache invalidation test Fred Griffoul
2026-01-02 14:24 ` [PATCH v4 06/10] KVM: nVMX: Cache evmcs fields to ensure consistency during VM-entry Fred Griffoul
2026-01-02 15:40 ` Vitaly Kuznetsov
2026-01-02 14:24 ` [PATCH v4 07/10] KVM: nVMX: Replace evmcs kvm_host_map with pfncache Fred Griffoul
2026-01-02 14:24 ` [PATCH v4 08/10] KVM: x86: Add nested context management Fred Griffoul
2026-05-12 0:13 ` Sean Christopherson
2026-01-02 14:24 ` [PATCH v4 09/10] KVM: nVMX: Use nested context for pfncache persistence Fred Griffoul
2026-01-02 14:24 ` [PATCH v4 10/10] KVM: selftests: Add L2 vcpu context switch test Fred Griffoul
2026-05-11 23:56 ` [PATCH v4 00/10] KVM: nVMX: Improve performance for unmanaged guest memory Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=agJnqh7ACEkB-ftx@google.com \
--to=seanjc@google.com \
--cc=dwmw@amazon.co.uk \
--cc=fgriffo@amazon.co.uk \
--cc=griffoul@gmail.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=shuah@kernel.org \
--cc=vkuznets@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox