Re: [PATCH v4 01/10] KVM: nVMX: Implement cache for L1 MSR bitmap

Linux Kernel Selftest development
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Fred Griffoul <griffoul@gmail.com>
Cc: kvm@vger.kernel.org, pbonzini@redhat.com, vkuznets@redhat.com,
	 shuah@kernel.org, dwmw@amazon.co.uk,
	linux-kselftest@vger.kernel.org,  linux-kernel@vger.kernel.org,
	Fred Griffoul <fgriffo@amazon.co.uk>
Subject: Re: [PATCH v4 01/10] KVM: nVMX: Implement cache for L1 MSR bitmap
Date: Mon, 11 May 2026 16:08:38 -0700	[thread overview]
Message-ID: <agJhdkVOPcGVvfjP@google.com> (raw)
In-Reply-To: <20260102142429.896101-2-griffoul@gmail.com>

On Fri, Jan 02, 2026, Fred Griffoul wrote:
> From: Fred Griffoul <fgriffo@amazon.co.uk>
> 
> Introduce a gfn_to_pfn_cache to optimize L1 MSR bitmap access by
> replacing map/unmap operations. This optimization reduces overhead
> during L2 VM-entry where nested_vmx_prepare_msr_bitmap() merges L1's MSR
> intercepts with L0's requirements.
> 
> Current implementation using kvm_vcpu_map_readonly() and
> kvm_vcpu_unmap() creates significant performance impact, mostly with
> unmanaged guest memory.
> 
> The cache is initialized when entering VMX operation and deactivated
> when VMX operation ends.
> 
> Signed-off-by: Fred Griffoul <fgriffo@amazon.co.uk>
> ---
>  arch/x86/kvm/vmx/nested.c | 42 +++++++++++++++++++++++++++++++++++----
>  arch/x86/kvm/vmx/vmx.h    |  2 ++
>  2 files changed, 40 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 6137e5307d0f..f05828aca7e5 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -316,6 +316,34 @@ static void vmx_switch_vmcs(struct kvm_vcpu *vcpu, struct loaded_vmcs *vmcs)
>  	vcpu->arch.regs_dirty = 0;
>  }
>  
> +/*
> + * Maps a single guest page starting at @gpa and lock the cache for access.
> + */
> +static int nested_gpc_lock(struct gfn_to_pfn_cache *gpc, gpa_t gpa)

These helpers belong in virt/kvm/pfncache.c, not in vmx/nested.c.  The names also
need to be changed (which is super obvious when they land in pfncache.c).  Nested
locks are very much a thing in the kernel; even with the nVXM context, these read
as "lock a GPC within another GPC critical section".

And this does waaaay more than just "lock" the gpc.  E.g. by the end of the series
this picks up:

		err = kvm_gpc_activate(gpc, gpa, PAGE_SIZE);
		if (err) {
			/*
			 * Deactivate nested state caches to prevent
			 * kvm_gpc_invalid() from returning true in subsequent
			 * is_nested_state_invalid() calls. This prevents an
			 * infinite loop while entering guest mode.
			 */
			if (gpc->vcpu)
				kvm_gpc_deactivate(gpc);

			return err;
		}

That's a pretty gross hack, as is the addition and use of kvm_gpc_invalid() to
detect that a vmcs12 page has been invalidated (though that may just be a naming
issue).

The use of nested_gpc_lock() in nested_vmx_prepare_msr_bitmap() and
nested_vmx_handle_enlightened_vmptrld() also feels forced.  The usage there is
_much_ more aligned with nested_gpc_lock_if_active(), in the sense that it's a
temporary establishing a host mapping versus grabbing a pfn that gets shoved into
a non-MMU structure.

> +{
> +	int err;
> +
> +	if (!PAGE_ALIGNED(gpa))
> +		return -EINVAL;
> +retry:
> +	read_lock(&gpc->lock);
> +	if (!kvm_gpc_check(gpc, PAGE_SIZE) || (gpc->gpa != gpa)) {

This order of thsese checks is bizarre.  It's not problematic per se, but it's
jarring to read.

> +		read_unlock(&gpc->lock);
> +		err = kvm_gpc_activate(gpc, gpa, PAGE_SIZE);

And the use of kvm_gpc_activate() in the retry loop is also bizarre.  It's "fine",
but it makes the lifecycle of the code hard to follow.

The other thing that stands out in later patches is the error handling.  I really
dislike the number of unlock() calls that are being added.

The good news is, I think pretty much all of my complaints are symptoms of major
gaps in the core gpc APIs.  E.g. you're smushing largely unrelated things into
nested_gpc_lock() because the retry loop is fugly.  If we add better APIs, then
the nVMX code shouldn't need to combine different concepts in weird ways, just
to avoid copy+pasting an ugly loop.

Happily, the majority of that can be sorted out in advance of converting nVMX.
E.g. instead of the stealtime conversion looking like so:

	if (!read_trylock(&gpc->lock))
		return;

	if (!kvm_gpc_check(gpc, sizeof(*st)))
		goto out_unlock_gpc;

	st = (struct kvm_steal_time *)gpc->khva;
	WRITE_ONCE(st->preempted, preempted);
	vcpu->arch.st.preempted = KVM_VCPU_PREEMPTED;

	kvm_gpc_mark_dirty_in_slot(gpc);

  out_unlock_gpc:
	read_unlock(&gpc->lock);

it can instead be something like:

	CLASS(gpc_try_map_local, st_map)(gpc, sizeof(st->preempted));
	if (IS_ERR(st_map))
		return;

	st = *st_map;
	WRITE_ONCE(st->preempted, KVM_VCPU_PREEMPTED);
	vcpu->arch.st.preempted = KVM_VCPU_PREEMPTED;

I have patches that appear to work.  I'll post a new version of David's series[*]
with the new APIs, and we can go from there (dropping the IRQ interactions makes
adding the CLASS() stuff easier, and you'll have to rebase regardless).

[*] https://lore.kernel.org/all/20260508181717.3230988-1-dwmw2@infradead.org

> +		if (err)
> +			return err;
> +
> +		goto retry;
> +	}
> +
> +	return 0;
> +}
> +
> +static void nested_gpc_unlock(struct gfn_to_pfn_cache *gpc)

This is a completely superfluous helper.  Looking ahead, many of the wrappers are
similarly useless, and IMO do more harm than good.  E.g. having bounce through
multiple implementations for something like this:

	if (!pi_test_and_clear_on(pi_desc)) {
		nested_unlock_pi_desc(vmx);
		return 0;
	}

is *really* frustrating when it's nothing more than:

	if (!pi_test_and_clear_on(pi_desc)) {
		read_unlock(&vmx->nested.pi_desc_cache->lock);
		return 0;
	}

All of that should be a moot point with better APIs, so this really "in the
future..." type of feedback.

next prev parent reply	other threads:[~2026-05-11 23:08 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-02 14:24 [PATCH v4 00/10] KVM: nVMX: Improve performance for unmanaged guest memory Fred Griffoul
2026-01-02 14:24 ` [PATCH v4 01/10] KVM: nVMX: Implement cache for L1 MSR bitmap Fred Griffoul
2026-05-11 23:08   ` Sean Christopherson [this message]
2026-01-02 14:24 ` [PATCH v4 02/10] KVM: pfncache: Restore guest-uses-pfn support Fred Griffoul
2026-01-02 14:24 ` [PATCH v4 03/10] KVM: x86: Add nested state validation for pfncache support Fred Griffoul
2026-01-02 14:24 ` [PATCH v4 04/10] KVM: nVMX: Implement cache for L1 APIC pages Fred Griffoul
2026-05-11 23:35   ` Sean Christopherson
2026-01-02 14:24 ` [PATCH v4 05/10] KVM: selftests: Add nested VMX APIC cache invalidation test Fred Griffoul
2026-01-02 14:24 ` [PATCH v4 06/10] KVM: nVMX: Cache evmcs fields to ensure consistency during VM-entry Fred Griffoul
2026-01-02 15:40   ` Vitaly Kuznetsov
2026-01-02 14:24 ` [PATCH v4 07/10] KVM: nVMX: Replace evmcs kvm_host_map with pfncache Fred Griffoul
2026-01-02 14:24 ` [PATCH v4 08/10] KVM: x86: Add nested context management Fred Griffoul
2026-05-12  0:13   ` Sean Christopherson
2026-01-02 14:24 ` [PATCH v4 09/10] KVM: nVMX: Use nested context for pfncache persistence Fred Griffoul
2026-01-02 14:24 ` [PATCH v4 10/10] KVM: selftests: Add L2 vcpu context switch test Fred Griffoul
2026-05-11 23:56 ` [PATCH v4 00/10] KVM: nVMX: Improve performance for unmanaged guest memory Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=agJhdkVOPcGVvfjP@google.com \
    --to=seanjc@google.com \
    --cc=dwmw@amazon.co.uk \
    --cc=fgriffo@amazon.co.uk \
    --cc=griffoul@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=shuah@kernel.org \
    --cc=vkuznets@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox